otsaloma / dataiter

Python classes for data manipulation
https://dataiter.readthedocs.io/
MIT License
25 stars 0 forks source link

Replace Pandas with Arrow #22

Open otsaloma opened 1 year ago

otsaloma commented 1 year ago

We're notably using Pandas for DataFrame.read_csv. That could probably be replaced with pyarrow.csv.read_csv, which would allow removing Pandas from the list of dependencies, leaving it as an optional dependency only needed for the from_pandas and to_pandas methods (with Pandas imported within the method body).

Arrow seems to be a lot faster at reading CSV files and we need it anyway for reading and writing Parquet files, so it would probably allow dropping something we've never liked and have sought to replace.