Closed jleibs closed 4 weeks ago
Awesome writeup!
The big challenge here is that sane-looking APIs are ambiguous without knowledge of the timeline.
One other possible direction to go here: What if you actually express the range or latest at query as an operation on the TimeColumnDescriptor? At that point you know what type it has in the store and can make it ergonomic. It's also symmetric in some way with how we handle other columns. It also avoids mistakes like doing rr.dataframe.Range.seconds("time", min=10.0, max=17)
when "time" was in fact a poorly named sequence timeline.
(this is kind of a half baked idea so likely mega annoying in some obvious way)
At that point you know what type it has in the store and can make it ergonomic.
I think it half-solves the problem, but it still doesn't actually handle seconds
vs nanoseconds
on a time-typed column, which is still its own problem. I think to do that we would need to introduce some kind of "natural units" metadata on the column, but that's also awkward and error prone.
I'm hesitant to pull in something like https://pypi.org/project/custom-literals/, but that's of course the kind of behavior that would really be nice.
Something I'm wondering about is how we handle multiple recording id's here.
Multiple rrds could all have the same recording is, so something like this makes sense:
recording = rr.data.load_recording("first.rrd", "second.rrd")
However, we can't know up front if those files contain one or two recording ids. How do we handle that?
The same goes for application id and any future user defined ids
Reminder: we still need an API for filtering all-empty columns. Examples: unused transform components, indicator components, etc. from the select_all()
context.
Updated Proposal:
Improved concept definitions
.filter(TimeRange(start=..., end=...))
or maybe.filter_range(start=..., end=...))
view.using_index_values(self, values: ArrayLike)
.select(Timeline(), "Translation3D")
Python APIs
Original Proposal (archive):