ibis-project / ibis

the portable Python dataframe library
https://ibis-project.org
Apache License 2.0
5.15k stars 591 forks source link

feat: make pandas/pyarrow/numpy optional dependencies #8430

Closed jcrist closed 1 month ago

jcrist commented 7 months ago

Currently pip install ibis-framework brings in dependencies on several common in-memory formats (pandas, pyarrow, and numpy). Historically this was due to technical limitations in the code base. We're now at a point where these could be made optional and would only be required if the user wants to output results in a specific format (e.g. expr.to_pyarrow() would naturally always require a pyarrow dependency).

Pros

Cons

jcrist commented 7 months ago

Two possible ideas:

1. Slim down ibis-framework

With this option we'd move the in-memory formats to be optional dependencies. For to_pandas() to work the user would have to request the pandas extra on install (or have pandas already installed through other means).

Install story:

pip install ibis-framework  # minimal install with no backends or memory formats
pip install ibis-framework[duckdb, pandas]  # install duckdb backend & pandas memory format support
pip install ibis-framework[bigquery, pyarrow]  # install bigquery backend & pyarrow memory format support

# We _might_ also want something like
pip install ibis-framework[basic]  # or `default`? idk. Some minimal common "recommended" set of extras for getting started. Probably duckdb & pandas & pyarrow.

Pros: one release artifact, simple to manage, sane for software devs Cons: can't do much with a stock pip install ibis-framework, need extras for both backends and memory formats. Possibly/probably confusing for new python users. Also a breaking change.

2. Create a new minimal ibis-core package

Alternatively we could create a new minimal ibis-core package (or some better name). While we could split the code into separate artifacts, we could also just move it all into ibis-core and have ibis-framework be a metadata-only package that depends on ibis-core + a few common dependencies. Metapackages like this are far easier to do in conda, but there are ways to do them with wheels as well.

The install story here would look like:

pip install ibis-core  # minimal install with no backends or memory formats
pip install ibis-core[duckdb, pandas]  # install duckdb backend & pandas memory format support
pip install ibis-framework[bigquery]  # metapackage that adds pandas & pyarrow deps

Pros: not a breaking change, pip install ibis-framework can still do some compute out-of-the-box Cons: metapackages in wheels are hacky (but doable), splitting our release artifacts is also not ideal


Having written that all out, I'm leaning towards option 1 as the best solution. It's the simplest to implement and also the simplest to explain. Ensuring new users know what extras they need to specify can be handled through documentation and generating good error messages when a feature requires some missing dependency.

gforsyth commented 7 months ago

I'm in favor of option 1, but not strictly opposed to option 2. Depending on if others are on board, do we want to try to get this in for 9.0?

jcrist commented 7 months ago

IMO if we decide this is something we want to do and it's easy enough to do then yeah the 9.0 release would be a good time to do it. I don't think I'd consider this a release blocker though, so if this turns into a :rabbit: :hole: then punting it to the next major release would be fine. </opinion>.

cpcloud commented 7 months ago

In favor of option 1.

We'll need to have some very prominent documentation about this on the landing page IMO, to help make it clear where to start.

Testing may get a bit hairy, but I think we can work it out.

lostmygithubaccount commented 7 months ago

I (strongly) prefer [default] over [basic]. should probably also include the examples install. if someone complains about it installing too many things -> instructions on installing the minimal dependencies they need for their use case

and -1 on meta packages, this caused a lot of issues at both of my previous companies

MarcoGorelli commented 6 months ago

Would really appreciate it if this were possible 🙏

Use-case: I'd like to be able to write Ibis syntax and translate it to substrait. Currently Ibis is a dependency of https://github.com/ibis-project/ibis-substrait , although as far as I can tell, pandas/pyarrow/numpy aren't required for ibis-substrait? A lightweight ibis-substrait would be fantastic!