Open unera opened 2 years ago
We already have issue #2237 Support array types in SQL, issue #3174 Introduce type ANY. issue #4762 sql: introduce type ARRAY (I wonder if this is a duplicate of issue#2237), issue #4763 sql: introduce type MAP. And there's been some discussion of them in https://github.com/tarantool/tarantool/blob/a5177fa9c4bd6c6fdb2ca5499b371ec5bf83774f/doc/rfc/5910-consistent-sql-lua-types.md -- which I feared was a bit premature. Anyway, I prefer the direction @kostja proposed with issue #1256 Document support (flattening and de-flattening). If there's interest in flattening then perhaps we could discuss it there.
@pgulutzan
I want to drop MAP, ARRAY tasks, merge three tasks into one and provide the simplest way for object-oriented fields (including non-supported types)
issue#3174 Introduce type ANY
ANY can be used as an operator in the future
WHERE foo = ANY (SELECT name FROM table)
So I think that the ticket should be closed.
issue#4762 issue#4763
These tickets (and ANY) can be merged into this.
Using native document type instead of dirty and unclear mapping of sql'ish types to real tarantool's document type seems very reasonable
If we have done the ticket, these tickets could be closed:
And we don't need to provide BINARY
types, as
See also the question about accessing map fields from Tarantool/SQL.
Why 'msgpack' as a name? MessagePack is a packing protocol, not a type. We not necessarily should store the values you define as 'msgpack' packed into real MessagePack. MessagePack is about compactness, not speed. For in-memory values while they are not saved into tuples it might be better to store them as plain C arrays and hash tables or key-value arrays for dictionaries. The same problem exists for tuples even now - their unpacking is costly when you need to access their fields. I wouldn't bring that issue into SQL map/array implementation. This means the name is not good for public usage IMO. I would consider 'dict' maybe, but this looks strange for arrays. Or 'json'. Because you use json syntax.
What is the problem with type 'any'? Is it even a type with its own special syntax? 'Any' simply has value of any of existing types. It has no its own specific values. Why do you mention it as a win of this new 'msgpack' type?
What is the problem with array/map support in the SQL parser? Can't it be the same as you did with msgpack but without the first quotes? What does the standard say? I remember, that at least for JSON it had something. Might have for arrays and maps too.
I see you say JSON path is something good in terms of usability, but IMO, field.card.name
looks much better than field['card.name']
. Did I misunderstand what you meant?
Placeholders in the msgpack strings is a crutch around not being able to write maps and arrays not as strings. You wouldn't need them if you could write {"time": CAST('2020-01-01' AS DATETIME)}
instead of msgpack('{"time": %1}}', CAST('2020-01-01' AS DATETIME))
.
You said We can provide SQL for fields without format (tuple tail) - how? And why is it related to how we define maps and arrays syntax?
You said Users can use the type as document-oriented storage - it also is not related to msgpack type as a special win. It is a general profit we get when implement arrays and maps anyhow. Not necessarily via msgpack type.
You said The syntax will be consistent - consistent with what? For example, you said it is called 'msgpack' type but you use JSON syntax to define it - inconsistent. Besides, the same level of consistency could be achieved by dropping 'msgpack' and the first quotes. Looks like JSON too, but does not mention msgpack anyhow.
You said that in the example SQL(a['foo']['bar']) vs SQL(a['foo.bar'])
we can't return NULL in the first case. But I think we can, it is an implementation detail. Additionally, in the second case you will pay at runtime to parse the string into tokens. This looks like an issue.
You said Lua can provide the library, too - please no, I beg you. Why would you need it in Lua? Its ability to work with maps and arrays is already good enough. Besides, there is msgpack module which can do the packing into MessagePack. No one asked for that in Lua.
Re the question "what does the standard say": you are correct, the standard document (SQL:2016) has sections about JSON, and about arrays. There is no function named JSONPATH and JSON is not a predefined data type.
Generally I like the idea but thumbs down on this ticket specifically because first I think it's more important to see ARRAY and MAP data types available in SQL, e.g. the way it's done in Cassandra. It's something that's already present in Tarantool as data types and it should be easy to work with these types in SQL.
At some point I believe box.space API for working with spaces should be deprecated altogether, and we should switch entirely to SQL. Before this happens I'd wait with adding new extensions to box.space api, it's already quite rich.
It is possible, though not easy, to read map fields with SQL. The Tarantool manual has an example: https://www.tarantool.io/en/doc/latest/reference/reference_sql/sql_plus_lua/#calling-lua-routines-from-sql It would be great if it was easier -- perhaps someone can suggest improvements to what I wrote there -- but no new data type is required.
Reason
Tarantool uses msgpack as a storage format. SQL isn't able to provide some document-oriented types: map, array, any. There may occur a new data type provided by tarantool index but not offered by SQL yet.
Also, the other databases offer document-oriented data types like JSON, BSON, etc.
So, it would be nice if Tarantool provided a document-oriented type for SQL and Lua. We could call the type 'msgpack', or 'object', or otherwise.
The type should
So, I suggest creating a new SQL type, 'msgpack'. The type can have an easy constructor from JSON-string.
Example:
CAST('{"id": 123, "card"{"name": "vasya"}}' AS msgpack)
ormsgpack('{"id": 123, "card"{"name": "vasya"}}')
To provide special types you can use placeholders in JSON:
msgpack('{"id": %1, "card": "name": %2}}', 123, CAST('2020-01-01' AS datetime))
The type must have a simple way to access its fields. I think that JSONPATH is a very consistent variant.
Example:
field['card.name']
JSONPATH can be used through an array, a map and a single scalar. So, the semantic will fit everywhere.
PROFITS
Compare:
SQL(a['foo']['bar'])
vsSQL(a['foo.bar'])
The first example must throw an error if a['foo'] doesn't exist, or it should be possible to write
NULL['bar']
. It's a shame.The second syntax encapsulates the problem inside. Its accessor will return expected
NULL
if 'foo'
doesn't exist.Lua can provide the library, too. Example: