pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.55k stars 1.06k forks source link

Unexpected Dataset aggregation behavior when weighting #8583

Open duncanwp opened 8 months ago

duncanwp commented 8 months ago

What happened?

When aggregating a dataset over specified dimensions I don't expect variables which don't have those dimensions to be aggregated.

What did you expect to happen?

When a weighting is applied to the aggregation, variables which do not have the aggregation dimensions are nevertheless aggregated. Presumably because the weights get broadcast across those variables. Perhaps this is the intended behavior but it seems surprising to me and should at least be documented I think.

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np

var1 = np.ones((2, 2, 3))

var2 = np.ones((3))

lon = np.arange(4).reshape(2, 2)
lat = np.arange(4).reshape(2, 2)

ds = xr.Dataset(
    {
        "temperature": (["x", "y", "time"], var1),
        "precipitation": (["time"], var2),
    },
    coords={
        "lon": (["x", "y"], lon),
        "lat": (["x", "y"], lat),
        "time": np.arange(3),
    },
)

print(ds.sum(['x', 'y']))
# Precipitation (with no x or y dimension) is not summed over, leading to values [1. 1. 1.]

print(ds.weighted(xr.ones_like(ds['temperature'])).sum(['x', 'y']))
# Precipitation is now summed over, leading to values [4. 4. 4.]

MVCE confirmation

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:38:11) [Clang 14.0.6 ] python-bits: 64 OS: Darwin OS-release: 23.1.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.1 xarray: 2023.3.0 pandas: 1.5.3 numpy: 1.23.5 scipy: 1.10.1 netCDF4: 1.6.3 pydap: None h5netcdf: None h5py: 3.8.0 Nio: None zarr: 2.14.2 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.6 cfgrib: None iris: 3.4.1 bottleneck: None dask: 2023.3.2 distributed: 2023.3.2.1 matplotlib: 3.7.1 cartopy: 0.21.1 seaborn: 0.12.2 numbagg: None fsspec: 2023.10.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 67.6.1 pip: 23.0.1 conda: None pytest: None mypy: None IPython: 8.12.0 sphinx: None
mathause commented 8 months ago

Thanks for your report! Related: #6952 and #7027