Open epigramx opened 2 years ago
Thanks @epigramx for the report.
Expected Behavior
The result printed for index 18, should be -1.487828 instead of -1.932612, because at that point the window is 3 and it looks for the max between -1.932612 and -2.539703 and -1.487828,
Note that on main this now raises ValueError: MultiWindowIndexer does not implement the correct signature for get_window_bounds
@simonjayhawkins - on main, adding the (unused) argument step
at the end of get_window_bounds
gives the same behavior as reported in OP.
I think this bug will be specific to max & min since it doesn't use the traditional sliding window algorithm that most all the other aggregation functions use: https://stackoverflow.com/a/12239580
@mroeschke - I haven't taken a look if the used algorithm can be adapted for arbitrary windows; if not, does it make sense to have two different algorithms (fastpath/slowpath)?
I am a little doubtful it can be sharable for other aggregations because IIUC the min/min window algorithm uses value comparisons since it's just looking for min/max
does it make sense to have two different algorithms (fastpath/slowpath)
I suppose so, but not too thrilled about maintaining heuristics when to use fast vs slow in addition to maintaining both algorithms.
We've had precedent for collapsing two different algorithms before trading off performance for the sake of correctness & maintainability, so if going back to the more correct algorithm doesn't incur that much of a performance hit I think that would be worthwhile
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
This method basically tries to use a
rolling
operation where thewindow
is an arbitrary series of integers instead of an integer or an offset. It is related to question/feature request #46716 and it was originally authored as an answer for a StackOverflow question here. There the author of the method notes on the bug: "The cython implementation seems to remember the largest starting index encountered so far and 'clips' smaller starting indices to the stored value. More technically correct: only stores the range of the largest start and largest end indices encountered so far in a queue, discarding smaller start indices and making them unavailable."Expected Behavior
The result printed for index 18, should be -1.487828 instead of -1.932612, because at that point the window is 3 and it looks for the max between -1.932612 and -2.539703 and -1.487828,
Installed Versions