Open bradyrx opened 3 years ago
Maybe the padding is breaking down for length-1 arrays?
I would look at padded
in this function
https://github.com/pydata/xarray/blob/c9b9eec73a033e275b6012e7d391dd42591ccf52/xarray/core/rolling.py#L461-L463
@dcherian, to add to the complexity here, it's even weirder than originally reported. See my test cases below. This might alter how this bug is approached.
import xarray as xr
def _rolling(ds):
return ds.rolling(time=6, center=False, min_periods=1).mean()
# Length 3 array to test that min_periods is called in, despite asking
# for 6 time-steps of smoothing
ds = xr.DataArray([1, 2, 3], dims='time')
ds['time'] = xr.cftime_range(start='2021-01-01', freq='D', periods=3)
bottleneck
installed, min_periods
is ignored as a kwarg with in-memory arrays.(bottleneck
installed)
# Just apply rolling to the base array.
ds.rolling(time=6, center=False, min_periods=1).mean()
>>> ValueError: Moving window (=6) must between 1 and 3, inclusive
# Group into single day climatology groups and apply
ds.groupby('time.dayofyear').map(_rolling)
>>> ValueError: Moving window (=6) must between 1 and 1, inclusive
bottleneck
uninstalled, min_periods
works with in-memory arrays.(bottleneck
uninstalled)
# Just apply rolling to the base array.
ds.rolling(time=6, center=False, min_periods=1).mean()
>>> <xarray.DataArray (time: 3)>
>>> array([1. , 1.5, 2. ])
>>> Coordinates:
>>> * time (time) object 2021-01-01 00:00:00 ... 2021-01-03 00:00:00
# Group into single day climatology groups and apply
ds.groupby('time.dayofyear').map(_rolling)
>>> <xarray.DataArray (time: 3)>
>>> array([1., 2., 3.])
>>> Coordinates:
>>> * time (time) object 2021-01-01 00:00:00 ... 2021-01-03 00:00:00
bottleneck
, dask
objects ignore min_period
when a groupby
object.This specifically seems like an issue with .map()
(independent of bottleneck
installation)
# Just apply rolling to the base array.
ds.chunk().rolling(time=6, center=False, min_periods=1).mean().compute()
>>> <xarray.DataArray (time: 3)>
>>> array([1. , 1.5, 2. ])
>>> Coordinates:
>>> * time (time) object 2021-01-01 00:00:00 ... 2021-01-03 00:00:00
# Group into single day climatology groups and apply
ds.chunk().groupby('time.dayofyear').map(_rolling)
>>> ValueError: For window size 6, every chunk should be larger than 3, but the smallest chunk size is 1.
>>> Rechunk your array
>>> with a larger chunk size or a chunk size that
>>> more evenly divides the shape of your array.
# Just apply rolling to the base array.
ds.rolling(time=6, center=False, min_periods=1).mean()
I feel like this should not work i.e. rolling window length (6) < size along axis (3). So the bottleneck error seems right.
The chunk size error in the last example should go away with #4977
I feel like this should not work i.e. rolling window length (6) < size along axis (3). So the bottleneck error seems right.
This is normally the case, but with min_periods=1
it should just return the given value so long as there's at least one observation (as in case #2, where the boundaries return as normal and the middle number is smoothed).
Thanks for the pointer on #4977!
encountered the same problem by Bottleneck.move_rank() ; i have to judge the length of dataframe in advance
vol_list =[3,5,10,20,30,60,90]
for i in vol_list:
if len(last_df) >= i:
last_df['vol_big_than_today_count_'+str(i)] = move_rank(last_df['vol'], i)
What happened:
When
bottleneck
is installed in an environment, it seems to ignore themin_periods
kwarg onds.rolling(...)
.What you expected to happen:
When using
ds.rolling(..., min_periods=1)
, it should be able to handle an array of length 1. Withoutbottleneck
installed, it returns the original value of a length 1 array. Withbottleneck
installed, the error is:Minimal Complete Verifiable Example:
With
bottleneck
installed to environment:Without
bottleneck
installed to environment:Anything else we need to know?:
In an applied case, this came up while working on
.groupby('time.dayofyear').map(_rolling)
, where we map a rolling mean function over a defined N days withmin_periods=1
. Some climatological days (like leap years) will not have the N day requirement, so themin_period
catch handles that, but withbottleneck
installed it breaks due to the above issue.Environment:
Output of xr.show_versions()
INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Jan 25 2021, 23:22:12) [Clang 11.0.1 ] python-bits: 64 OS: Darwin OS-release: 19.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.2 pandas: 1.2.1 numpy: 1.19.5 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: 0.8.1 h5py: 3.1.0 Nio: None zarr: 2.6.1 cftime: 1.3.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.1.8 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.01.1 distributed: 2021.01.1 matplotlib: 3.3.3 cartopy: 0.18.0 seaborn: 0.11.1 numbagg: None pint: 0.16.1 setuptools: 49.6.0.post20210108 pip: 21.0 conda: None pytest: 6.2.2 IPython: 7.18.1 sphinx: 3.4.3