Open trendelkampschroer opened 5 years ago
If I get it correctly all non-fixed freq DateOffsets cam be represented as Timedelta instances?
I don't think so. Timedeltas always represent an absolute, fixed duration. A non-fixed offset like BusinessDay doesn't have a fixed number of nanoseconds.
Yes of course, sorry for the confusion. What I meant to ask was the opposite:
Can all fixed freq DateOffsets
be represented as Timedelta
instances?
Updated my comment above.
I'm not sure what the issue is then. Timedeltas are accepted in DataFrame.rolling.
On Thu, Jan 24, 2019 at 7:01 AM Benjamin Trendelkamp-Schroer < notifications@github.com> wrote:
Yes of course, sorry for the confusion. What I meant to ask was the opposite: Can all fixed freq DateOffsets be represented as Timedelta instances?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/24900#issuecomment-457188161, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIuRAVAUvaNa2-2ngwzU0x7s6HHi9ks5vGa6UgaJpZM4aQhMt .
Thanks for the quick reply. I should have been more precise: I meant that rolling should not accept DateOffsets, but int
, and Timedelta
instances (or str
which can be cast to Timedelta
).
My intent is twofold:
i) I want to understand the difference between a fixed frequency DateOffset
and a Timedelta
ii) If they are equivalent (for purposes of rolling operations) then I want to stimulate the discussion that settles whether one is to prefer over the other for rolling operations. The ideal outcome would be (at least) a comment in the docstring or the examples section of pandas.DataFrame.rolling
giving a clear indication of the preferred usage.
The docstring for pandas.DataFrame.rolling
says:
window : int, or offset
Size of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size.
If its an offset then this will be the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes. This is new in 0.19.0
This suggest that you can use arbitrary DateOffsets
but in fact only those with a fixed frequency are admissible. But if the only admissible offsets can as well be represented as a Timedelta
than this should be made clear in the docstring or somewhere in the examples.
This also means that 'offset' might not be the best word to use here, as arbitrary offsets are not permitted.
If there is a rolling operation that can only be performed via DateOffsets
and not via Timedeltas
than I'd be eager to learn about it also.
Agreed that the rolling
docstring could use clarification.
To answer your questions:
i) Essentially, there is very little difference between fixed frequency offsets (called Tick
s internally but has not been really exposed in the documentation) and Timedeltas, e.g. pd.offsets.Hour()
behaves the same as Timedelta(hour=1)
arithmetically. Fixed frequencies exist to behave within the frequency system of pandas.
ii) There is no preference between the two when using the rolling operation.
Overall, we should specify the DateOffset
must be fixed-frequency in the docstring.
Thank you for your answer. Furthermore I'd encourage using Timedelta
instead of DateOffset
in the docstring.
As far as I understand a valid Timedelta
will always work with rolling operations (for any DatetimeIndex) while DateOffset
may raise if it is not fixed frequency.
I am emphasizing this as it took me
me some time to realise this. With that understanding I found it now easier to design code that internally uses rolling
operations.
This behaviour of rolling
is also in stark contrast to resample
for which a non fixed freq DateOffset
is a valid argument.
Code Sample, a copy-pastable example if possible
Problem description
The documentation states that rolling can be used with
DateOffset
. In fact it can only be used with fixed freq DateOffsets, usage with non-fixed freq DateOffsets will raise. If I get it correctly all fixed freq DateOffsets can be represented asTimedelta
instances? Wouldn't it make sense to allow onlyTimedelta
instead of DateOffset for rolling operations.Apologies, if I am missing a case where a fixed freq
DateOffset
cannot be expressed as aTimedelta
. In any case the documentation should be more explicit about the admissibleDateOffsets
Output of
pd.show_versions()