package `xarray` and `xarray-core` in conda-forge

dcherian commented 1 week ago

What is your issue?

The current set of Xarray dependencies is very minimal. https://github.com/pydata/xarray/blob/3fd162e42bb309cfab03c2c18b037d1ad3cd3193/pyproject.toml#L25-L29

This is pretty unfriendly to a new user, and not a great out-of-the-box experience. You can't read any files (except npz, csv, parquet I guess), you can't access any tutorial datasets, you can't make plots, and you're missing a bunch of effectively free performance optimizations.

I think the current set of minimal dependencies is more appropriate to an xarray-core package. Here are our optional dependencies for example: https://github.com/pydata/xarray/blob/3fd162e42bb309cfab03c2c18b037d1ad3cd3193/pyproject.toml#L31-L48

Proposal

I suggest that we migrate to xarray-core and xarray packages in conda-forge.:

xarray-core will have the current set of minimal dependencies.
For xarray I propose the following dependencies:
1. flox, opt_einsum, numbagg for accelerated computations
2. fsspec, netcdf, zarr for reading common datasets & "cloud"
3. matplotlib for plotting.
4. pooch to read tutorial datasets

Related: dask packages dask-core and I think matplotlib packages matplotlib-base

dcherian commented 1 week ago

Note that there are many user survey comments asking for performance improvements

"whatever can speed up computations would be welcomed"
Optimizations, especially for "resample" and "rolling".
faster computation
Faster/smaller dask graph parallelizations?
faster, less overhead
"also, can we fix the fact that groupby() on a dimension with only one chunk returns something with a chunk size of one on that dimension? It produces huge graph sizes."

And then this counter-example 🤷🏾‍♂️ : "lightweight version without heavy dependencies"

max-sixty commented 1 week ago

(Yes, I thought I asked something similar a while ago around pooch but can't find it)

Ideally we would allow {name="xarray", default-features=false} for the minority of users that want the slim version. But IIUC python doesn't have any notion of "default but not required dependencies".

So +1 on xarray-core vs xarray in that case
Another option would be encouraging xarray[standard], but that doesn't seem like a common thing in python either

pydata / xarray

package `xarray` and `xarray-core` in conda-forge #9149

What is your issue?

Proposal