Open matthewmturner opened 4 days ago
I've made a quick PR to expose the table provider.
TableProviderFactory
is a little interesting. I might need some help making sure I understand the various inputs.
Is the intention to open up an existing table from a location? Or is the intention to create a branch new empty table? Or both?
x, y
and the dataset schema is x, z, y
should we hide z
in the created dataset? Or should we just ignore this if the dataset already exists?Does this sound correct?
@westonpace appreciate your quick and thoughtful response.
The intent here is to be able to be able to write DDL like the following so that I can start reading the lance format (I believe the TableProviderFactory
may also enable writing to the format but I think that would only be if that was implemented by the TableProvider
(dont quote me on this though).
CREATE EXTERNAL TABLE my_table STORED AS LANCE LOCATION '/path/to/lance';
Here is an example of how we use the DeltaTableFactory
for this purpose.
Unfortunately, I'm not that familiar with Lance semantics to be able to answer the specifics on how that maps to TableProviderFactory
(but im hoping to start learning more about it - hence this issue ;) ). Here is some documentation on how it works though which can hopefully help.
To the extent its reasonable on your side i would think a v1 that only exposes the simplest functionality would be reasonable.
I would like to add Lance as a supported file type in dft similar to how we currently have deltalake and are working on hudi / Iceberg support. All of these formats are accessed via DataFusions
TableProviderFactory
. I see thatTableProvider
is already implemented so I am hoping that can be extended.