Open dcherian opened 3 months ago
@dcherian I feel like you're practically the only person who would have realized that this is expressible as (2) 😅
I like the idea of adding some kind of cumsum syntactic sugar, especially if the underlying implementation can be in terms of groupby so it doesn't add much maintenance burden.
Brief reminder that we have .cumulative
, so we could use that to add in some complications if needed!
Is your feature request related to a problem?
It is pretty common to want to run
cumsum
and have the sum reset when a boolean flag array is1
. This is so common it has its own Wikipedia page and is discussed in Blelloch (1993) (Section 1.5)Here's a real example of someone trying to implement it in a fairly roundabout way.
We have a few options to implement it:
We could introduce a new method
DataArray.segmented_scan(flags, op="sum")
or a new classDataArray.segment.cumsum()
? A dask/cubed friendly version that does all of this in a single scan should be fairly straightforward to write (and similar to ourffill
,bfill
wrappers).In a way this generalizes
resample
and it just struck me that the example above could be written as the following, which should be OK once flox adds scansGrouper
functionality to expose a "flag" grouper that hides thegroup_idx = (cube == 0).cumsum('time')
line.My concern with (2) and (2.i) is that they are not at all obvious for most of our userbase.