tarantool / tarantool

Get your data in RAM. Get compute close to data. Enjoy the performance.
https://www.tarantool.io
Other
3.36k stars 376 forks source link

New document-oriented data type for Tarantool SQL. #6251

Open unera opened 2 years ago

unera commented 2 years ago

Reason

Tarantool uses msgpack as a storage format. SQL isn't able to provide some document-oriented types: map, array, any. There may occur a new data type provided by tarantool index but not offered by SQL yet.

Also, the other databases offer document-oriented data types like JSON, BSON, etc.

So, it would be nice if Tarantool provided a document-oriented type for SQL and Lua. We could call the type 'msgpack', or 'object', or otherwise.

The type should

So, I suggest creating a new SQL type, 'msgpack'. The type can have an easy constructor from JSON-string.

Example:

CAST('{"id": 123, "card"{"name": "vasya"}}' AS msgpack) or msgpack('{"id": 123, "card"{"name": "vasya"}}')

To provide special types you can use placeholders in JSON:

msgpack('{"id": %1, "card": "name": %2}}', 123, CAST('2020-01-01' AS datetime))

The type must have a simple way to access its fields. I think that JSONPATH is a very consistent variant.

Example: field['card.name']

JSONPATH can be used through an array, a map and a single scalar. So, the semantic will fit everywhere.

PROFITS

Compare: SQL(a['foo']['bar']) vs SQL(a['foo.bar'])

The first example must throw an error if a['foo'] doesn't exist, or it should be possible to write NULL['bar']. It's a shame.

The second syntax encapsulates the problem inside. Its accessor will return expected NULL if 'foo' doesn't exist.

Lua can provide the library, too. Example:

local foo = mp('{"id": 123, "card"{"name": "vasya"}}')
print(foo['card.name'])
pgulutzan commented 2 years ago

We already have issue #2237 Support array types in SQL, issue #3174 Introduce type ANY. issue #4762 sql: introduce type ARRAY (I wonder if this is a duplicate of issue#2237), issue #4763 sql: introduce type MAP. And there's been some discussion of them in https://github.com/tarantool/tarantool/blob/a5177fa9c4bd6c6fdb2ca5499b371ec5bf83774f/doc/rfc/5910-consistent-sql-lua-types.md -- which I feared was a bit premature. Anyway, I prefer the direction @kostja proposed with issue #1256 Document support (flattening and de-flattening). If there's interest in flattening then perhaps we could discuss it there.

unera commented 2 years ago

@pgulutzan

I want to drop MAP, ARRAY tasks, merge three tasks into one and provide the simplest way for object-oriented fields (including non-supported types)

issue#3174 Introduce type ANY

ANY can be used as an operator in the future

WHERE foo = ANY (SELECT name FROM table)

So I think that the ticket should be closed.

issue#4762 issue#4763

These tickets (and ANY) can be merged into this.

Mons commented 2 years ago

Using native document type instead of dirty and unclear mapping of sql'ish types to real tarantool's document type seems very reasonable

unera commented 2 years ago

If we have done the ticket, these tickets could be closed:

And we don't need to provide BINARY types, as

Totktonada commented 2 years ago

See also the question about accessing map fields from Tarantool/SQL.

Gerold103 commented 2 years ago
pgulutzan commented 2 years ago

Re the question "what does the standard say": you are correct, the standard document (SQL:2016) has sections about JSON, and about arrays. There is no function named JSONPATH and JSON is not a predefined data type.

kostja commented 2 years ago

Generally I like the idea but thumbs down on this ticket specifically because first I think it's more important to see ARRAY and MAP data types available in SQL, e.g. the way it's done in Cassandra. It's something that's already present in Tarantool as data types and it should be easy to work with these types in SQL.

At some point I believe box.space API for working with spaces should be deprecated altogether, and we should switch entirely to SQL. Before this happens I'd wait with adding new extensions to box.space api, it's already quite rich.

pgulutzan commented 2 years ago

It is possible, though not easy, to read map fields with SQL. The Tarantool manual has an example: https://www.tarantool.io/en/doc/latest/reference/reference_sql/sql_plus_lua/#calling-lua-routines-from-sql It would be great if it was easier -- perhaps someone can suggest improvements to what I wrote there -- but no new data type is required.