pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.65k stars 1.09k forks source link

Rolling mean with bool performs sum #8864

Open chandley564 opened 8 months ago

chandley564 commented 8 months ago

What happened?

Taking a rolling mean of a DataArray with dytpe=bool doesn't behave as I would expect. Rather than converting to int and taking the rolling mean the result is equivilent to converting to int then taking a rolling sum.

What did you expect to happen?

No response

Minimal Complete Verifiable Example

import numpy as np
from xarray import DataArray

int_raster = DataArray(
    data=[0, 1, 1, 0, 1, 0],
    dims=("x"),
)

expected_rolling_mean = DataArray(
    data=[np.nan, 2 / 3, 2 / 3, 2 / 3, 1 / 3, np.nan],
    dims=("x"),
)

bool_raster = int_raster.astype(bool)

int_rolling_mean = int_raster.rolling(x=3, center=True).mean()
bool_rolling_mean = bool_raster.rolling(x=3, center=True).mean()
rolling_sum = int_raster.rolling(x=3, center=True).sum()

print("Expected: \n", expected_rolling_mean, "\n")
print("With int dtype: \n", int_rolling_mean, "\n")
print("With bool dtype: \n", bool_rolling_mean, "\n")
print("Rolling sum: \n", rolling_sum)

MVCE confirmation

Relevant log output

Expected: 
 <xarray.DataArray (x: 6)> Size: 48B
array([       nan, 0.66666667, 0.66666667, 0.66666667, 0.33333333,
              nan])
Dimensions without coordinates: x 

With int dtype: 
 <xarray.DataArray (x: 6)> Size: 48B
array([       nan, 0.66666667, 0.66666667, 0.66666667, 0.33333333,
              nan])
Dimensions without coordinates: x 

With bool dtype: 
 <xarray.DataArray (x: 6)> Size: 48B
array([nan,  2.,  2.,  2.,  1., nan])
Dimensions without coordinates: x 

Rolling sum: 
 <xarray.DataArray (x: 6)> Size: 48B
array([nan,  2.,  2.,  2.,  1., nan])
Dimensions without coordinates: x

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.5 (tags/v3.11.5:cce6ba9, Aug 24 2023, 14:38:34) [MSC v.1936 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 154 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: ('English_New Zealand', '1252') libhdf5: None libnetcdf: None xarray: 2024.2.0 pandas: 2.1.4 numpy: 1.26.2 scipy: 1.12.0 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None iris: None bottleneck: None dask: 2024.3.1 distributed: None matplotlib: 3.8.2 cartopy: None seaborn: None numbagg: None fsspec: 2024.3.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.5.0 pip: 23.2.1 conda: None pytest: 7.4.3 mypy: None IPython: 8.18.1 sphinx: 6.2.1
welcome[bot] commented 8 months ago

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

max-sixty commented 8 months ago

FWIW this seems to be correct under numbagg or bottleneck; so it's an issue with the naive xarray routines. We could just raise an error there.

Expected:
 <xarray.DataArray (x: 6)> Size: 48B
array([       nan, 0.66666667, 0.66666667, 0.66666667, 0.33333333,
              nan])
Dimensions without coordinates: x

With int dtype:
 <xarray.DataArray (x: 6)> Size: 48B
array([       nan, 0.66666667, 0.66666667, 0.66666667, 0.33333333,
              nan])
Dimensions without coordinates: x

With bool dtype:
 <xarray.DataArray (x: 6)> Size: 48B
array([       nan, 0.66666667, 0.66666667, 0.66666667, 0.33333333,
              nan])
Dimensions without coordinates: x

Rolling sum:
 <xarray.DataArray (x: 6)> Size: 48B
array([nan,  2.,  2.,  2.,  1., nan])
Dimensions without coordinates: x