heavyai / heavydb

HeavyDB (formerly OmniSciDB)
https://heavy.ai
Apache License 2.0
2.93k stars 445 forks source link

Map type support (hstore postgresql equivalent) #744

Open igor-suhorukov opened 2 years ago

igor-suhorukov commented 2 years ago

Typical analytical data de-normalized and contains map of primitive types. PostgreSQL hstore extension/type solve this task, Clickhouse also support Map, Apache Spark and SparkSQL also allows user manipulate with de-normalized key-value data

Please support map type of key-values of strings, long, int, short, float, double and dictionary encoded values and skeys, svals, [] postgresql equivalent operations on Map data and size() function

JSON type and functions as alternative. But in most analytical use case JSON is overkill feature. Another option implement sql function that allow operate on two column of array type simultaneously to behave like key->value filter

mflaxman10 commented 1 year ago

Thanks for the request.

JSON_VALUE is now supported and pretty performant., so I'd try that first depending on the use case.

Or could do today in full SQL, using UNNEST with a subquery to maintain the ORDER, as here: https://stackoverflow.com/questions/53635020/bigquery-argmax-is-array-order-maintained-when-doing-cross-join-unnest

As you mention, more generically two arrays can do the trick with an extra function. That doesn't currently exist and would currently require a C++ extension to generate. But at 7.0 should be possible to generate such compiled for you from NUMBA Python (just needs to array support in RBC)

igor-suhorukov commented 1 year ago

Hi @mflaxman10 Thank you for info. Is there any JSON type in HeavyAI or it just regular string? How to extract all keys from json on first nested level by using JSON_VALUE ?