orchestracities / ngsi-timeseries-api

QuantumLeap: a FIWARE Generic Enabler to support the usage of NGSIv2 (and NGSI-LD experimentally) data in time-series databases
https://quantumleap.rtfd.io/
MIT License
38 stars 49 forks source link

Typed Arrays #147

Open c0c0n3 opened 5 years ago

c0c0n3 commented 5 years ago

With #142 we introduced better support for saving structured values as objects in Crate. We should probably extend that line of work to include support for arrays of structured values, which should be saved as arrays of objects in Crate.

But as we are at it, why not consider typed arrays? Currently we support conversion from the following NGSI attribute types: Boolean, Number, Integer, Text, DateTime, ISO8601, geo:json, geo:point, StructuredValue. But when it comes to arrays, we convert array elements to strings so we only support storing arrays of strings in Crate. It'd be nice if we could support arrays of all the above types---i.e. array[bool], array[float], array[object], etc.

See also: #36, #24.

c0c0n3 commented 5 years ago

As of PR #214, we switched to untyped arrays in the API spec for values returned in query responses. While this works, it can be improved: we may want to consider using typed arrays again. The array type in the spec will probably have to be the sum type S of all types catered for by the NGSI spec, i.e. array[S] where S = bool | number | ....

github-actions[bot] commented 3 years ago

Stale issue message

Ajaysamdyan commented 3 years ago

Hi @chicco785, I would like to contribute on this issue. Please assign it to me.

chicco785 commented 3 years ago

hi @Ajaysamdyan, as by @c0c0n3 analysis this is an issue specific to crate db, so basically you need to looking into crate translator: https://github.com/orchestracities/ngsi-timeseries-api/blob/master/src/translators/crate.py

the thing may be a bit tricky to handle without messing up also sql_translator, but not impacting it would be the best option I think.

Ajaysamdyan commented 3 years ago

Hi @c0c0n3 , thanks for sharing the details and guidance. As per my primary investigation on cratedb, i have tested and verify that cratedb support the insertion of array object of different type like: text, Boolean, number etc. I have executed the below mentioned some queries for verification:

CREATE TABLE my_table_arrays ( tags array(text), objects array(object as (age integer, name text))); INSERT INTO my_table_arrays (tags, objects) values (null, [{age = 27, name = 'ABCD'}]);

Here, data is successfully stored into the database. So kindly let me know, is my investigation going in right direction ?

c0c0n3 commented 3 years ago

Hi @Ajaysamdyan, that's great, you're definitely heading in the right direction. One thing to consider is that my comments are ancient and probably only applicable to Crate 3 which we don't support anymore. Do new versions of Crate support inserting and then querying an arbitrary JSON array? If so then, we don't need the typed approach anymore. In other words, would this work with a recent version of Crate

CREATE  TABLE t (x array)
INSERT INTO t (x) VALUES ( [ 1, "yo!", true, { k = 2, h = 'foo' } ] )

and if does, then how can you query column x? can you e.g. drill into the JSON object x[3]?

The reason why I'm mentioning this is that it'd simplify the Python implementation....

Ajaysamdyan commented 3 years ago

Hi @c0c0n3 , Please find the major blue points supported by latest crateDB version;

  1. All array elements must have the same data type, which determines the inner type of the array
  2. CrateDB currently does only support JSON-String / properties embedded in curly brackets { }

So, currently inserting and casting of a json object array of different type is not supported by crateDB. Kindly find the reference link for the same. https://crate.io/docs/crate/reference/en/4.6/general/ddl/data-types.html#data-type-object-json https://github.com/crate/crate/issues/11714

I had tested and verified it. kindly suggest me a way to proceed.

c0c0n3 commented 3 years ago

Hi @Ajaysamdyan, thanks for sharing your findings. I think I might've found a workaround that could let us store JSON arrays in Crate---tested with version 4.5.1. Here's a sum up.

To recap, here's a pseudo JSON (sum) type definition

   JSON := null | bool | string | number | [JSON] | {JSON}

Notice the recursive definition of array []. In JSON, an array has type array of JSON so array members are allowed to be any JSON value in the above sum type. For example, all of these are valid JSON arrays:

[1, 2]             -- a homogeneous-type array: all numbers
["yo!", true]      -- a mixed-type array: strings and booleans
[{"k": 1}, 2]      -- a mixed-type array: objects and numbers

-- a more complex mixed-type array: string, array with a string and a bool, number
["wada", ["wada", false], 3] 

The problem is that is seems Crate doesn't support mixed-type arrays. So while Crate will swallow the first array without a hiccup, it'll moan about the others---to be honest, I think for good reasons, but my opinion is irrelevant here.

What to do then? We know Crate supports JSON-like objects, so we might be able to encode a JSON array into an object. The mapping could be like this (pseudo-code)

NGSI attribute:
an_array_attr {
    type: array
    value: [...]
}

Crate column:
    an_array_attr OBJECT { value: array }

Like I said earlier, I've given this a try with Crate 4.5.1 and it seems to work for the most part but there are some corner cases we need to look into and some design issues to consider. Here's my test.

CREATE TABLE t (an_array_attr OBJECT(ignored));
INSERT INTO t (an_array_attr) VALUES ( '{"value": [1, 2]}'::object );
INSERT INTO t (an_array_attr) VALUES ( '{"value": ["yo!", true]}'::object );
INSERT INTO t (an_array_attr) VALUES ( '{"value": [{"k": 1}, 2]}'::object );
INSERT INTO t (an_array_attr) VALUES ( '{"value": ["wada", ["wada", false], 3]}'::object );

While all these inserts worked without a glitch, there still are things to consider which might make this solution unfeasible.

  1. The object constraint is ignored which means we lose indexing and hence it'll be inefficient to query the values inside the array. At the moment QL queries don't drill into arrays or objects so maybe this isn't too much of an issue, but I'm wondering if we could use dynamic instead of ignore and what the trade-offs would be.
  2. Is this encoding actually supported by Crate? Or did it work by accident?
  3. When querying the column, the value retrieved for the last insert is {value: [wada, 3]}. That is, the second element of the original array ([wada, false]) seems to have disappeared. Assuming the answers to (2) is yes, then is this a Crate bug?
Ajaysamdyan commented 3 years ago

Hi @c0c0n3, Thanks for sharing the test cases, I have also performed this test and observed the same behavior of cratedb.

Please find the below mentioned test case of dynamic insertion:

CREATE TABLE t1 (an_array_attr OBJECT(DYNAMIC));  Pass

Homogeneous Type:

INSERT INTO t1 (an_array_attr) VALUES ( '{"value": [1, 2]}'::object );  Pass

Mixed_Array Type:

INSERT INTO t1 (an_array_attr) VALUES ( '{"value": ["yo!", true]}'::object );  Failed, Error: Validation failed for an_array_attr: Cannot cast {"value"=['yo!', true]} to type object]
INSERT INTO t1 (an_array_attr) VALUES ( '{"value": [{"k": 1}, 2]}'::object ); Failed, Error:  Validation failed for an_array_attr: Cannot cast {"value"=[{"k"=1}, 2]} to type object]
INSERT INTO t1 (an_array_attr) VALUES ( '{"value": ["wada", ["wada", false], 3]}'::object ); Failed, Error; Validation failed for an_array_attr: Cannot cast {"value"=['wada', ['wada', false], 3]} to type object]
INSERT INTO t3 (an_array_attr) VALUES ( '{"value": ["yo!", true]}'::object ); Validation failed for an_array_attr: Mixed dataTypes inside a list are not supported. Found text and boolean]

Here, a new table t3 created using same dynamic insertion.

It seems that cratedb does not support the above mixed array data types. Please correct my understanding, if i am wrong.

Moreover, I have also asked the cratedb community regarding the ignored case (comment 2and 3). Kindly find the attached link for the same. https://community.crate.io/t/is-cratedb-support-mixed-array-type-insertion-and-querying-of-the-column/809

c0c0n3 commented 3 years ago

Hi @Ajaysamdyan, that's brilliant, thanks for this and thanks for asking the Crate community about mixed-type support. Let's wait to see what the answer is and then we'll take it from there.

Ajaysamdyan commented 3 years ago

Hi @c0c0n3 , As per the discussion with crate community today, it has been found that it is a cratedb Admin UI bug. And thay are working on to fix this issue.

Chat link ; https://community.crate.io/t/is-cratedb-support-mixed-array-type-insertion-and-querying-of-the-column/809/

c0c0n3 commented 3 years ago

Hi @Ajaysamdyan

it is a cratedb Admin UI bug

Ha! I actually tested the queries through the admin UI too and just assumed the data wasn't there. Glad it's just a display issue and the data doesn't actually get zapped :-)

Unless @chicco785 has anything on the contrary, we could go ahead with the implementation I suppose?

amotl commented 3 years ago

Hi there,

the CrateDB 4.6.4 hotfix release will include Admin UI 1.19.2, which includes the fix https://github.com/crate/crate-admin/pull/745 by @proddata - thanks a stack!

The Docker image is already available through crate/crate:4.6.4.

With kind regards, Andreas.

tstorek commented 3 days ago

@c0c0n3 Is this still under active development? According to Crate-Docs: https://cratedb.com/docs/crate/reference/en/4.8/general/ddl/data-types.html#type-object it should easily work to use Arrays of Objects. Is this already supported in QL and only the docs are outdated?

c0c0n3 commented 1 day ago

@tstorek nope, it hasn't been implemented yet...