I work with time series data in Pandas, would love to move to Polars but need a way to replicate my use of pd.RangeIndex. I know that Polars doesn't have indices and support that decision, however, I would need a column with the equivalent features.
How I currently use pd.RangeIndex:
My code supports regularly and irregularly sampled time series
The index (in Polars it would be just a column) contains the values of the time axis in milliseconds
Irregularly sampled time series have an index of type pd.Index (in Polars this would be e.g. a column of integers)
Regularly sampled time series have an index of type pd.RangeIndex, where the step member depends on the sampling rate (e.g. step == 5 corresponds to 200 Hz, step == 2 corresponds to 500 Hz)
Taking slices of such data frames just works, because RangeIndex has a start parameter: y = x[5:] will just result in another RangeIndex with y.start == x.start + 5 * x.step
Proposed behaviour:
Add a new column type e.g. AffineColumn<dtype> with members start: dtype and step: dtype
The n'th element of such a column is start + n * step
The column is always sorted
Many functions can be optimized:
Some return another AffineColumn: e.g. head, tail, gather, gather_every, take, take_every, slice, ...
Some are a nop: e.g. sort, unique, ...
Adding / subtracting AffineColumns: their starts and steps are added / subtracted
I think it's fairly common to have such columns in many different applications, storing them as start and step saves lots of memory and clearly documents intent.
Description
I work with time series data in Pandas, would love to move to Polars but need a way to replicate my use of
pd.RangeIndex
. I know that Polars doesn't have indices and support that decision, however, I would need a column with the equivalent features.How I currently use
pd.RangeIndex
:pd.Index
(in Polars this would be e.g. a column of integers)pd.RangeIndex
, where thestep
member depends on the sampling rate (e.g.step == 5
corresponds to 200 Hz,step == 2
corresponds to 500 Hz)RangeIndex
has astart
parameter:y = x[5:]
will just result in anotherRangeIndex
withy.start == x.start + 5 * x.step
Proposed behaviour:
AffineColumn<dtype>
with membersstart: dtype
andstep: dtype
start + n * step
AffineColumn
: e.g.head
,tail
,gather
,gather_every
,take
,take_every
,slice
, ...sort
,unique
, ...AffineColumns
: theirstart
s andstep
s are added / subtractedI think it's fairly common to have such columns in many different applications, storing them as
start
andstep
saves lots of memory and clearly documents intent.