Open pdpark opened 1 month ago
~It seems that we could introduce hive_partitioning setting to fix this ticket.~
~It seems that we could introduce hive_partitioning setting to fix this ticket.~
@shamb0 has made a PR to document hive partitioned, we just need to review and merge it. As for custom partitioning scheme that is not Hive, ~I'm not convinced we want to expose that as it is probably an edge case. Unless you have an idea that we haven't considered~
EDIT: We're still open to considering this, but are waiting for more user requests
~It seems that we could introduce hive_partitioning setting to fix this ticket.~
@shamb0 has made a PR to document hive partitioned, we just need to review and merge it. As for custom partitioning scheme that is not Hive, I'm not convinced we want to expose that as it is probably an edge case. Unless you have an idea that we haven't considered
I see, it makes sense to me.
Btw we should add an example in https://docs.paradedb.com/ingest/import/parquet#parquet-options for the hive partitioned
~It seems that we could introduce hive_partitioning setting to fix this ticket.~
@shamb0 has made a PR to document hive partitioned, we just need to review and merge it. As for custom partitioning scheme that is not Hive, I'm not convinced we want to expose that as it is probably an edge case. Unless you have an idea that we haven't considered
I see, it makes sense to me. Btw we should add an example in https://docs.paradedb.com/ingest/import/parquet#parquet-options for the
hive partitioned
Agreed
What feature are you requesting?
The ability to specify a custom partitioning scheme through the use of a pattern in the
files
option when creating foreign tables, like this:Why are you requesting this feature?
To support existing custom partitioning scheme.
What is your proposed implementation for this feature?
Foreign tables could be created like this:
...or this:
The values in brackets must correspond with column names defined in the referenced parquet files or the statement will fail.
When running a query like this on the first table defined above:
...the
id_1
andid_2
column values from the sql where clause will be substituted into thefiles
pattern producing a string that must correspond with an actual parquet file at the specified s3 location:s3://bucket/data_1234_0987.parquet
A query on the second table table defined above:
...will produce a
files
pattern after substitution that looks like this:s3://bucket/data_1234_*.parquet
Full Name:
Patrick Park
Affiliation:
Payzer