Closed jcrist closed 1 month ago
Two possible ideas:
ibis-framework
With this option we'd move the in-memory formats to be optional dependencies. For to_pandas()
to work the user would have to request the pandas
extra on install (or have pandas
already installed through other means).
Install story:
pip install ibis-framework # minimal install with no backends or memory formats
pip install ibis-framework[duckdb, pandas] # install duckdb backend & pandas memory format support
pip install ibis-framework[bigquery, pyarrow] # install bigquery backend & pyarrow memory format support
# We _might_ also want something like
pip install ibis-framework[basic] # or `default`? idk. Some minimal common "recommended" set of extras for getting started. Probably duckdb & pandas & pyarrow.
Pros: one release artifact, simple to manage, sane for software devs
Cons: can't do much with a stock pip install ibis-framework
, need extras for both backends and memory formats. Possibly/probably confusing for new python users. Also a breaking change.
ibis-core
packageAlternatively we could create a new minimal ibis-core
package (or some better name). While we could split the code into separate artifacts, we could also just move it all into ibis-core
and have ibis-framework
be a metadata-only package that depends on ibis-core
+ a few common dependencies. Metapackages like this are far easier to do in conda
, but there are ways to do them with wheels as well.
The install story here would look like:
pip install ibis-core # minimal install with no backends or memory formats
pip install ibis-core[duckdb, pandas] # install duckdb backend & pandas memory format support
pip install ibis-framework[bigquery] # metapackage that adds pandas & pyarrow deps
Pros: not a breaking change, pip install ibis-framework
can still do some compute out-of-the-box
Cons: metapackages in wheels are hacky (but doable), splitting our release artifacts is also not ideal
Having written that all out, I'm leaning towards option 1 as the best solution. It's the simplest to implement and also the simplest to explain. Ensuring new users know what extras they need to specify can be handled through documentation and generating good error messages when a feature requires some missing dependency.
I'm in favor of option 1, but not strictly opposed to option 2. Depending on if others are on board, do we want to try to get this in for 9.0?
IMO if we decide this is something we want to do and it's easy enough to do then yeah the 9.0 release would be a good time to do it. I don't think I'd consider this a release blocker though, so if this turns into a :rabbit: :hole: then punting it to the next major release would be fine. </opinion>
.
In favor of option 1.
We'll need to have some very prominent documentation about this on the landing page IMO, to help make it clear where to start.
Testing may get a bit hairy, but I think we can work it out.
I (strongly) prefer [default]
over [basic]
. should probably also include the examples install. if someone complains about it installing too many things -> instructions on installing the minimal dependencies they need for their use case
and -1 on meta packages, this caused a lot of issues at both of my previous companies
Would really appreciate it if this were possible 🙏
Use-case: I'd like to be able to write Ibis syntax and translate it to substrait. Currently Ibis is a dependency of https://github.com/ibis-project/ibis-substrait , although as far as I can tell, pandas/pyarrow/numpy aren't required for ibis-substrait? A lightweight ibis-substrait would be fantastic!
Currently
pip install ibis-framework
brings in dependencies on several common in-memory formats (pandas
,pyarrow
, andnumpy
). Historically this was due to technical limitations in the code base. We're now at a point where these could be made optional and would only be required if the user wants to output results in a specific format (e.g.expr.to_pyarrow()
would naturally always require apyarrow
dependency).Pros
polars
where the engine has its own output format (and wouldn't necessarily needpandas
/numpy
/pyarrow
).Cons
pip install ibis-framework
wouldn't be able to execute against any backend or return any format, since no backends/formats would be included in the basic install. This may confuse some new users who are unfamiliar with how pip's extras work.pip install ibis-framework[biquery]
to pull in pandas/pyarrow, dropping these as required dependencies would be a breaking change.