innobi / pantab

Read/Write pandas DataFrames with Tableau Hyper Extracts
BSD 3-Clause "New" or "Revised" License
94 stars 35 forks source link

Allow sinking and scanning of lazyframes #346

Open skyth540 opened 3 hours ago

skyth540 commented 3 hours ago

Is your feature request related to a problem? Please describe. Lazyframes allow for larger-than-memory dataframes to be handled. pantab does not support lazyframes

Describe the solution you'd like a sink_to_hyper function similar to sink_parquet and a scan_from_hyper function similar to scan_parquet would be very useful for projects involving lazyframes.

Describe alternatives you've considered I've tried to sink_parquet and use this library to convert, however it doesn't work with my file, and that takes extra time and overhead anyhow

WillAyd commented 3 hours ago

In pantab we don't necessarily want to tie ourselves to any of the libraries this strongly; we just provide one interface and it is up to the library to implement the Arrow C Data specification

I think this is more of a question for polars. You may want to ask about this upstream in an issue like https://github.com/pola-rs/polars/issues/12530 and see what plans they have for implementing that for Lazyframes