rstudio / pins-python

https://rstudio.github.io/pins-python/
MIT License
52 stars 12 forks source link

Should `pandas` be an optional dependency? #261

Closed nathanjmcdougall closed 3 months ago

nathanjmcdougall commented 4 months ago

This is very similar to the discussion in #233 about making pins DF-library agnostic.

I'm in two minds about this.

On the one hand, the vast majority of the time, anyone who wants to use pins will be using pandas. On the other hand, that means they would already have it installed, making it unlikely that pandas being optional would cause major friction.

Making it optional would enable polars users etc. to use pins without needing to install pandas (see #153).

It is being considered to add pyarrow as a required dependency to pandas which would increase the installation size by ~120MB. https://github.com/pandas-dev/pandas/issues/54466

The costs to this project would be additional code complexity to protect import statements with try-except, as well as potentially some internal refactoring (e.g. as_df options).

isabelizimm commented 3 months ago

For pins in the short/medium-term future, I would be more keen to have pandas built-in and expand the ability to use polars/any other dataframe library as desired. I do believe pins should have at least one reasonable library included so users can read data, perhaps pinned by a colleague or from R, as a data frame without having to make a decision on what type of df that is/seeing errors if there is no dataframe library installed.

I could see a world where the default library is polars instead of pandas, but I do think pandas still has the masses for now.

nathanjmcdougall commented 3 months ago

Sounds good to me.