pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.5k stars 1.04k forks source link

xarray.DataArray.weighted performs unweighted mean if dimension names differ without any warning #6952

Open xabipedru opened 1 year ago

xabipedru commented 1 year ago

What happened?

dataarray.weighted(weights).mean() performs an unweighted mean if dimensions of weights and to-be-weighted-array differ. That is ok, but a warning message informing of it would be very helpful, as currently the mean is just performed and the user may assume that a weighted mean happenned.

What did you expect to happen?

I would expect to get a warning when performing the mean, knowing that it is not using the weights that it got assigned.

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np
test_array = xr.DataArray(np.array([0,1,2]),
                     dims=('dim1'),
            )
weights = xr.DataArray(np.array([1,50,100]),
                     dims=('dim1'),
            )
weights2 = xr.DataArray(np.array([1,50,100]),
                     dims=('dim2'),
            )
# apply weights
test_array_weighted = test_array.weighted(weights)
#now apply weights with different dimension name
test_array_weighted2 = test_array.weighted(weights2)
# compute weighted mean
if (test_array_weighted.mean() != test_array_weighted2.mean()):
    print ('Different means')
    print ('test_array_weighted.mean=',test_array_weighted.mean())
    print ('test_array_weighted2.mean=',test_array_weighted2.mean())

MVCE confirmation

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:03) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 4.18.0-305.25.1.el8_4.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.3.0 pandas: 1.4.2 numpy: 1.20.3 scipy: 1.8.1 netCDF4: 1.6.0 pydap: None h5netcdf: 1.0.0 h5py: 3.7.0 Nio: None zarr: 2.11.3 cftime: 1.6.0 nc_time_axis: 1.4.1 PseudoNetCDF: None rasterio: 1.2.10 cfgrib: 0.9.8.5 iris: 3.2.1 bottleneck: 1.3.4 dask: 2022.6.0 distributed: 2022.6.0 matplotlib: 3.5.2 cartopy: 0.20.2 seaborn: 0.11.2 numbagg: None fsspec: 2022.5.0 cupy: None pint: 0.19.2 sparse: 0.13.0 setuptools: 62.3.4 pip: 22.1.2 conda: 4.13.0 pytest: 7.1.2 IPython: 8.4.0 sphinx: 5.0.1
mathause commented 1 year ago

Good point, that can be surprising. I think a warning could be added.

if not set(self.obj.dims) & set(self.weights.dims):
    warnings.warn("")

However, I wonder if that would be too noisy, e.g. if there is a scalar on the Dataset:

xr.Dataset({"test_array": test_array, "scalar": 1}).weighted(weights2).mean()

(could potentially skip scalars in the check).


Edit:

https://github.com/pydata/xarray/blob/790a444b11c244fd2d33e2d2484a590f8fc000ff/xarray/core/weighted.py#L192

may not be the right place, as it checks it on the Dataset level (and not on the DataArray level).