pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.49k stars 1.04k forks source link

package `xarray` and `xarray-core` in conda-forge #9149

Open dcherian opened 1 week ago

dcherian commented 1 week ago

What is your issue?

The current set of Xarray dependencies is very minimal. https://github.com/pydata/xarray/blob/3fd162e42bb309cfab03c2c18b037d1ad3cd3193/pyproject.toml#L25-L29

This is pretty unfriendly to a new user, and not a great out-of-the-box experience. You can't read any files (except npz, csv, parquet I guess), you can't access any tutorial datasets, you can't make plots, and you're missing a bunch of effectively free performance optimizations.

I think the current set of minimal dependencies is more appropriate to an xarray-core package. Here are our optional dependencies for example: https://github.com/pydata/xarray/blob/3fd162e42bb309cfab03c2c18b037d1ad3cd3193/pyproject.toml#L31-L48

Proposal

I suggest that we migrate to xarray-core and xarray packages in conda-forge.:

  1. xarray-core will have the current set of minimal dependencies.
  2. For xarray I propose the following dependencies:
    1. flox, opt_einsum, numbagg for accelerated computations
    2. fsspec, netcdf, zarr for reading common datasets & "cloud"
    3. matplotlib for plotting.
    4. pooch to read tutorial datasets

Related: dask packages dask-core and I think matplotlib packages matplotlib-base

dcherian commented 1 week ago

Note that there are many user survey comments asking for performance improvements

And then this counter-example 🤷🏾‍♂️ : "lightweight version without heavy dependencies"

max-sixty commented 1 week ago

(Yes, I thought I asked something similar a while ago around pooch but can't find it)

Ideally we would allow {name="xarray", default-features=false} for the minority of users that want the slim version. But IIUC python doesn't have any notion of "default but not required dependencies".