Open mathause opened 6 years ago
Very useful suggestion.
Also I cannot really think of an easy way to append the first elements to the end of the dataset and then calculate rolling.
We already support a different type of "rolling" with periodicity http://xarray.pydata.org/en/latest/generated/xarray.DataArray.roll.html?highlight=roll and it is straightforward to apply roll operations at the variable level: https://github.com/pydata/xarray/blob/master/xarray/core/variable.py#L1007-L1026
I suspect this would not be too hard to implement.
ds.rolling(center=True, dayofyear=31).mean()
What do you mean by rolling
here? As a reduce operation over a rolling window, like a moving average (rolling
in xarray)? Or rolling around the end of a dataarray when shifting (roll
in xarray)? Or a mix of both?
Specifically, what's the dayofyear=31
doing?
Probably a mix of both - I want to compute a moving average, but with periodic boundaries. rolling
sets window_length - 1
elements to nan. However I want to calculate these like so:
running_mean_0 = xr.concat([ds[-15:], ds[:16]]).mean()
running_mean_1 = xr.concat([ds[-14:], ds[:17]]).mean()
and so on...
I think what I want is like the filter
function in R with circular=True
.
I found two possibilities but they are quite "hand made" and certainly not very efficient
Solution with slicing:
# take the last and first elements and append/ prepend them
first = ds[:15]
last = ds[-15:]
extended = xr.concat([last, ds, first], 'dayofyear')
# do the rolling on the extended ds and get rid of NaNs
sol1 = extended.rolling(dayofyear=31, center=True).mean().dropna('dayofyear')
Solution with roll
:
roll1 = ds.roll(dayofyear=150).rolling(dayofyear=31, center=True).mean()
roll2 = ds.rolling(dayofyear=31, center=True).mean()
sol2 = xr.concat([roll1, roll2], dim='r').mean('r')
Call rolling
on original and roll
ed dataset, and put them together again.
I think the implementation would be not so difficult by supporting more flexible fill_value
option in xr.DataArrayRolling.construct
method.
Maybe fill_value='periodic'
would be a possible API,
da.rolling(dayofyear=31).construct('window', fill_value='periodic').mean('window')
@mathause, any interest in contributing?
Though I'm not sure you need the construct
machinery.
IIUC you need to copy a window-sized amount of data from the front of the array onto the back.
You could do that with construct-like machinery, which would save a copy - though not a large copy
@maxim-lian , do you agree to add this feature?
Although the same behavior can be realized by adding head/tail values to the original array and truncate them after the computation, the periodic
option would significantly simplify this.
@fujiisoup Yes for sure - I think it would be good. I think there are two salient questions:
construct
? I don't think the kwarg should be called fill_value
- that traditionally has a specific meaning of "the value to replace NaN
with". I don't understand the periodic
reference, but likely I'm missing something. Could be wrap=True
, or roll=True
?construct
machinery? Or we make a new array and run rolling
over it?I don't think the kwarg should be called fill_value - that traditionally has a specific meaning of "the value to replace NaN with".
Agreed. dask.ghost
has boundaries
keyword, for which we can choose between periodic, reflect, and any constant.
I think this would be a good reference.
Maybe we can deprecate fill_value
keyword and replace it by boundaries
?
(I slightly regret that I choose fill_value
keyword in construct
).
How it's implemented - do you have a view here?
Only a slight modification of construct
machinery realizes this (see #2011).
I think this option should be available only in construct
method (not in the traditional rolling constructor) for the sake of simplicity (according to this comment).
pad
Agree with your other comments. Thanks as ever @fujiisoup
I was going to suggest this feature so glad others are interested.
In my use case I would like to smooth a daily climatology. My colleague uses matlab and uses https://www.mathworks.com/matlabcentral/fileexchange/52688-nan-tolerant-fast-smooth
Using the slice
solution as @mathause showed above, it would look something like (using code from http://xarray.pydata.org/en/stable/examples/weather-data.html#toy-weather-data)
import numpy as np
import pandas as pd
import xarray as xr
times = pd.date_range('2000-01-01', '2010-12-31', name='time')
annual_cycle = np.sin(2 * np.pi * (times.dayofyear.values / 366 - 0.28))
noise = 15 * np.random.rand(annual_cycle.size)
data = 10 + (15 * annual_cycle) + noise
da = xr.DataArray(data, coords=[times], dims='time')
#da.plot()
#Check variability at one day
#da.groupby('time.dayofyear').std('time')[0]
da_clim = da.groupby('time.dayofyear').mean('time')
_da_clim = xr.concat([da_clim[-15:], da_clim, da_clim[:15]], 'dayofyear')
da_clim_smooth = _da_clim.rolling(dayofyear=31, center=True).mean().dropna('dayofyear')
#da_clim_smooth.plot()
I am coming back to @shoyer suggestion in #2011 - your idea would be to do first a pad
and then a rolling
operation as e.g.:
import numpy as np
import xarray as xr
x = np.arange(1, 366)
y = np.random.randn(365)
ds = xr.DataArray(y, dims=dict(dayofyear=x))
ds.pad(dayofyear=15, mode='wrap').rolling(center=True, dayofyear=31).mean()
I think you may need to do cropping afterwards, too, before taking the mean.
I just need to find the three warmest consecutive months from a temperature dataset for my work, so I thought I add a complete example. First, create an example dataset with monthly temperature:
import xarray as xr
import numpy as np
import pandas as pd
time = pd.date_range("2000", periods=12 * 30, freq="M")
temp = np.sin((time.month - 5) / 6 * np.pi) + np.random.randn(*time.shape) * 0.3
da = xr.DataArray(temp, dims=["time"], coords=dict(time=time))
print(da)
<xarray.DataArray (time: 360)>
array([-0.676731, -0.812742, -1.367547, ..., 0.186731, 0.237676, -0.343879])
Coordinates:
* time (time) datetime64[ns] 2000-01-31 2000-02-29 ... 2029-12-31
Currently we can achieve this like:
n_months = 3
monthly = da.groupby("time.month").mean()
padded = monthly.pad(month=n_months, mode="wrap")
rolled = padded.rolling(center=True, month=n_months).mean(skipna=False)
sliced = rolled.isel(month=slice(3, -3))
central_month = sliced.idxmax()
Implementing pad_mode
in rolling
would allow to do:
monthly = da.groupby("time.month").mean()
rolled = monthly.rolling(center=True, month=n_months, pad_mode="wrap").mean(skipna=False)
central_month = rolled.idxmax()
I think we should do pad_kwargs
so that monthly.rolling(center-True, month=n_months, pad_kwargs=dict(mode="constant", constant_values=5)}
is possible.
Hello! First of all, thanks so much for those of you who contribute to xarray, I've found it super useful as an n-dimensional extension of pandas!
I was just wondering what the current state of this issue is? I'm running into exactly the issue described in https://github.com/pydata/xarray/issues/4743 which seems like a bug; that issue was closed as a dupe of this. Are we just waiting for someone to implement something here, or are there other blockers?
This should be easy now so we just need someone to try it out.
This is where the padding happens so the kwarg needs to be passed all the way down to Variable.rolling_window
https://github.com/pydata/xarray/blob/67d1955cc1d6941edd12a325090da7e4d029b84c/xarray/core/variable.py#L2132
What should the API be? .rolling(..., pad_kwargs={...})
or just have .pad(...).rolling(..., pad=False)
i.e. have the user pad
beforehand? The only advantage to the first one AFAICT is that the user doesn't have to specify the window length (or padding length) twice.
.pad(...).rolling(..., pad=False)
For this API, it seems that the only thing that would need to be implemented would be adding a pad
keyword argument to rolling, defaulting to True
. Is that correct?
Yes. I think we might want this anyway; for rolling
without any padding at all.
I added support for .rolling(..., pad=False)
in #5603. The basic implementation was simple, but getting it working for bottleneck/dask took a little more work.
That fixes, e.g., #4743, but I don't think it's a complete fix for this issue.
Code Sample, a copy-pastable example if possible
Problem description
rolling
cannot directly handle periodic boundary conditions (lon, dayofyear, ...), but could be very helpful to e.g. calculate climate indices. Also I cannot really think of an easy way to append the first elements to the end of the dataset and then calculate rolling.Is there a way to do this? Should xarray support this feature?
This might also belong to SO...