Open pdpark opened 1 month ago
Hi @philippemnoel , I am interested to work on this feature, could you please assign it to me
Hi @philippemnoel , I am interested to work on this feature, could you please assign it to me
Absolutely, it is yours! Thank you for your work :)
What feature are you requesting?
Support for foreign tables in a multi-level partitioned table setup (see example).
Why are you requesting this feature?
When you have a lot of large files in one object store key/directory, scanning all of the files can be prohibitively time consuming. Changing the storage structure to support hive keys may not always be an option.
What is your proposed implementation for this feature?
This is an example of a multi-level partition table, but in many cases only the first partition level would be needed.
Example:
A “root” table that will have two partition “levels”:
The table for the first partition level:
Many partition tables may be created at this level - one for each
id_1
value in the root table.Note: If these first two tables have to be created as foreign tables in order for the last partition level table to be a foreign table, they shouldn’t require a files option.
The table for the next partition level looks like this:
Many partition tables may be created at this level as well - one for each
id_2
value in the first partition level tables.Note: when creating this final “leaf” partition table as a foreign table, a
files
option should be required.So, when querying the root table with a where clause like this:
...only one parquet file should be scanned - the one specified by the files option of the “leaf” partition table. Or, if a glob pattern is used in the
files
option of the leaf partition table, only the (ideally) few number of matching files will be scanned.Full Name:
Patrick Park
Affiliation:
Payzer