innobi / pantab

Read/Write pandas DataFrames with Tableau Hyper Extracts
BSD 3-Clause "New" or "Revised" License
115 stars 44 forks source link

Make PyArrow Dependency Optional #354

Open WillAyd opened 1 month ago

WillAyd commented 1 month ago

Needs some investigation, but I think we have a feasible path to replacing pyarrow with arro3 internally. The only thing we use pyarrow for is to create a recordbatchreader and convert that into the appropriate end dataframe libraries.

If we can replace that with arro3, it should save a good deal of installation size

WillAyd commented 1 month ago

Actually we don't need arro3 or pyarrow for cases where users opt for the capsule return type implemented in https://github.com/innobi/pantab/pull/378

If we dropped pyarrow as a dependency, we might just have to add a check in the reader like:


if return_type != "stream":
    import pyarrow as pa
    ... # handle error if not installed