Open KevinLourd opened 9 years ago
I don't think this is a bug per se, rather a convention / api issue.
IIRC (and i'll have to look further), it is actually reindexing here. (that's why the stamps that match with your original have values, but the others don't).
Doesn't seem very useful though.
In [1]: rng = pd.date_range('20130101',periods=10,freq='T')
In [2]: ts=pd.Series(np.arange(len(rng)), index=rng)
In [8]: ts.resample('54s',how='mean')
Out[8]:
2013-01-01 00:00:00 0
2013-01-01 00:00:54 1
2013-01-01 00:01:48 2
2013-01-01 00:02:42 3
2013-01-01 00:03:36 4
2013-01-01 00:04:30 5
2013-01-01 00:05:24 6
2013-01-01 00:06:18 7
2013-01-01 00:07:12 8
2013-01-01 00:08:06 NaN
2013-01-01 00:09:00 9
Freq: 54S, dtype: float64
In [9]: ts.resample('54s')
Out[9]:
2013-01-01 00:00:00 0
2013-01-01 00:00:54 NaN
2013-01-01 00:01:48 NaN
2013-01-01 00:02:42 NaN
2013-01-01 00:03:36 NaN
2013-01-01 00:04:30 NaN
2013-01-01 00:05:24 NaN
2013-01-01 00:06:18 NaN
2013-01-01 00:07:12 NaN
2013-01-01 00:08:06 NaN
2013-01-01 00:09:00 9
Freq: 54S, dtype: float64
what would your expectation be for the result using the input of np.arange(len(ts))
?
I would expect the output[8] that you printed (thank you for the how="mean" tip). However, that is not working, as explained below:
Taking for instance a smaller input set:
rng = pd.date_range('20130101',periods=3,freq='T')
ts=pd.Series(np.arange(len(rng)), index=rng)
print(ts)
2013-01-01 00:00:00 0
2013-01-01 00:01:00 1
2013-01-01 00:02:00 2
Freq: T, dtype: int64
When trying to divide in 5 parts, we have only 4... :
from datetime import timedelta
length = 5
timeSpan = (ts.index[-1]-ts.index[0]+timedelta(minutes=1))
rule = int(timeSpan.total_seconds()/length)
tsNew=ts.resample(str(rule)+"S").mean()
print(tsNew)
2013-01-01 00:00:00 0
2013-01-01 00:00:36 1
2013-01-01 00:01:12 NaN
2013-01-01 00:01:48 2
Freq: 36S, dtype: float64
I would expect an extra line with a 2 or a NaN like this:
2013-01-01 00:02:24 NaN
The example taken by jreback is a particular case, since it is rounded at 00:09:00 minutes, that is why there is the correct number of row that appears
So the fill_method
argument applies to the filling for upsample (which is odd because its not consistent with other methods).
That said, there are a LOT of options for resample.
In [17]: ts.resample('36s',fill_method='pad',closed='right')
Out[17]:
2013-01-01 00:00:00 0
2013-01-01 00:00:36 0
2013-01-01 00:01:12 1
2013-01-01 00:01:48 1
2013-01-01 00:02:24 2
Freq: 36S, dtype: int64
Just remembered for the first example, this requires upsampling so fill_method applies.
In [21]: ts.resample('54s',fill_method='pad')
Out[21]:
2013-01-01 00:00:00 0
2013-01-01 00:00:54 0
2013-01-01 00:01:48 1
2013-01-01 00:02:42 2
2013-01-01 00:03:36 3
2013-01-01 00:04:30 4
2013-01-01 00:05:24 5
2013-01-01 00:06:18 6
2013-01-01 00:07:12 7
2013-01-01 00:08:06 8
2013-01-01 00:09:00 9
Freq: 54S, dtype: int64
ts.resample('36s',fill_method='pad',closed='right')
works fine.
Although there is no rational reason to be obliged to put closed=right
since what is expected here is a closed=left
...
Pandas resample bugs when upsampling a time serie with same size splits :
For instance, I have a time serie of size 10:
print(ts)
When trying to resample in N > 10 parts it doesn't work:
print(tsNew)
Note: here is my versions:
pd.show_versions()
Thank you for your help