paradedb / pg_analytics

DuckDB-powered analytics for Postgres
https://paradedb.com
PostgreSQL License
384 stars 15 forks source link

Support un-nesting JSONB arrays #90

Open georgelza opened 3 months ago

georgelza commented 3 months ago

What feature are you requesting?

I have a dataset/document as follows, which is pushed to iceberg/s3 into parquet format. i've configured paradedb to be able to query the data.

i need to be able to flatten/select from the documents in the basket items array. i need to be able to filter based on values in the individual documents in the basket items array.

{ "invoiceNumber": "1341243123341232", "saleDateTime_Ltz": "2023-12-23T16:53:39.911+02:00", "salesTimetamp_Epoc": "1718117619911", "store" : { "id": "1033", "name": "Derry" }, "clerk": { "id": "231", "name": "Martin", "surname": "Smith" }, "terminalPoint": "14", "basketItems":[ { "id": "234123412", "name": "Minty Frsh", "brand": "Colgate", "category": "Healthcare", "price":12412.00, "quantity":3 }, { "id": "234123421", "name": "All Bran", "brand": "Kellog's", "category": "Cereal", "price":12.00, "quantity":3 }, { "id": "534123412", "name": "Sugar Free", "brand": "Coke", "category": "Cool drinks", "price":112.00, "quantity":2 }, { "id": "224123412", "name": "Auto Wash", "brand": "OMO", "category": "Cleaning", "price":22.00, "quantity":4 } ], "nett": 442.23, "vat":10.00, "total":452.23 }

Why are you requesting this feature?

paradedb is to be a front end for out analytics. our source data is going via stream via flink into a iceberg table format on S3 into parquet file format. source data is multi multi level json structured, far t complex to flatten into old shape column/rows.

What is your proposed implementation for this feature?

.

Full Name:

George Leonard

Affiliation:

none

rebasedming commented 2 months ago

As of #103 there is a workaround for this. You need to unnest at CREATE FOREIGN TABLE time but the following is now possible:

-- without unnest
CREATE FOREIGN TABLE nested ()
SERVER parquet_server
OPTIONS (files '~/Downloads/test_duckdb_types.parquet', select 'struct_col');

select * from nested;
        struct_col
--------------------------
 {"a": "abc", "b": "def"}

-- with unnest
CREATE FOREIGN TABLE unnested ()
SERVER parquet_server
OPTIONS (files '~/Downloads/test_duckdb_types.parquet', select 'unnest(struct_col)');

select * from unnested;
  a  |  b
-----+-----
 abc | def
(1 row)

In the above example, test_duckdb_types.parquet has a JSON field called struct_col.