Open georgelza opened 3 months ago
As of #103 there is a workaround for this. You need to unnest
at CREATE FOREIGN TABLE
time but the following is now possible:
-- without unnest
CREATE FOREIGN TABLE nested ()
SERVER parquet_server
OPTIONS (files '~/Downloads/test_duckdb_types.parquet', select 'struct_col');
select * from nested;
struct_col
--------------------------
{"a": "abc", "b": "def"}
-- with unnest
CREATE FOREIGN TABLE unnested ()
SERVER parquet_server
OPTIONS (files '~/Downloads/test_duckdb_types.parquet', select 'unnest(struct_col)');
select * from unnested;
a | b
-----+-----
abc | def
(1 row)
In the above example, test_duckdb_types.parquet
has a JSON field called struct_col
.
What feature are you requesting?
I have a dataset/document as follows, which is pushed to iceberg/s3 into parquet format. i've configured paradedb to be able to query the data.
i need to be able to flatten/select from the documents in the basket items array. i need to be able to filter based on values in the individual documents in the basket items array.
{ "invoiceNumber": "1341243123341232", "saleDateTime_Ltz": "2023-12-23T16:53:39.911+02:00", "salesTimetamp_Epoc": "1718117619911", "store" : { "id": "1033", "name": "Derry" }, "clerk": { "id": "231", "name": "Martin", "surname": "Smith" }, "terminalPoint": "14", "basketItems":[ { "id": "234123412", "name": "Minty Frsh", "brand": "Colgate", "category": "Healthcare", "price":12412.00, "quantity":3 }, { "id": "234123421", "name": "All Bran", "brand": "Kellog's", "category": "Cereal", "price":12.00, "quantity":3 }, { "id": "534123412", "name": "Sugar Free", "brand": "Coke", "category": "Cool drinks", "price":112.00, "quantity":2 }, { "id": "224123412", "name": "Auto Wash", "brand": "OMO", "category": "Cleaning", "price":22.00, "quantity":4 } ], "nett": 442.23, "vat":10.00, "total":452.23 }
Why are you requesting this feature?
paradedb is to be a front end for out analytics. our source data is going via stream via flink into a iceberg table format on S3 into parquet file format. source data is multi multi level json structured, far t complex to flatten into old shape column/rows.
What is your proposed implementation for this feature?
.
Full Name:
George Leonard
Affiliation:
none