Closed dominiklohmann closed 7 months ago
Could also make this begin:end:step
, while we are at it (e.g., ::2
or 0:-1:2
for every second event).
step
is interesting, as it allows "strided" sampling. But that could also be a dedicated sample
operator applied downstream, where striding with step size is just one of many ways to sample.
I think adding stride with start:end:stride
à la Python is a natural evolution for this operator because it's a syntax that is familiar to most users already. We also already have an aggregation function named sample
(choose one element of a series).
However, please note that there exist multiple ways to implement strides. The naïve implementation creates batches of size one. The less naïve implementation keeps the stride in the series implementation without actually modifying the data, and applies the stride lazily when the data is serialized. We should be mindful of that.
For the scope of this roadmap item we should consider ignoring the stride for now. I've added a task to the roadmap item to consider adding strides.
We want to implement a
slice begin:end
syntax that cuts events or bytes in an interval $[begin, end)
$. A negative index counts from the end rather than from the beginning, which is a syntax familiar to users coming with experience in Python, jq, and many other languages.Both
begin
andend
can be omitted, and default to the 0 and the size of the input, respectively.