marsupialtail / quokka

Making data lake work for time series
https://marsupialtail.github.io/quokka/
Apache License 2.0
1.14k stars 60 forks source link

Input Source - Delta Lake #30

Open Chase-Edwards opened 1 year ago

Chase-Edwards commented 1 year ago

Delta Lake is a common OSS table format that would be useful to support with Quokka.

marsupialtail commented 1 year ago

It is in progress. In fact if you look at setup.py I already included the optional dependencies.

marsupialtail commented 1 year ago

Contributions very welcome -- it shouldn't be that different from a regular list of parquet inputs.

https://marsupialtail.github.io/quokka/tutorial/ https://github.com/marsupialtail/quokka/blob/master/pyquokka/dataset.py#L29

SemyonSinchenko commented 1 year ago

Hello! Why just not to use delta-rs library? Of course, it is possible to implement it from scratch, but it would make maintenance harder. Of course, it requires to have this dependency on all the nodes, but I see that with iceberg you used side-dependency instead of writing reader from scratch.