Open Famok opened 1 month ago
If I understand correctly, your proposed API would result in the following data being logged:
mydataframe/x
with index
timestamps and a component with df["x"]
as content,mydataframe/y
with index
timestamps and a component with df["y"]
as content,both on the mytimeline
timeline.
Is that correct?
In general, having a dataframe-based API is very good fit for our new columnar stuff. I see at least two points here:
send_dataframe
API ends up logging to multiple "sub-entities" (as I think you suggest here), there would be little performance gain w.r.t separate send_columns
calls. Chunks (our new fundamental data structure) always apply to a single entity, so multiple chunks would need to be emitted here in any case. (This is not to say that a convenience API wouldn't be useful.)send_dataframe
API logs column to a single entity, but different components, then we'd need to figure out a mapping from Python-side column dtype
/label to component type (with the restriction that each components of a single entity must have a unique type). In particular, your example seems ambiguous as to what component type should be used.Creating subentities seems to be rhe easiest way.
I can't see how the second option would work, I don't know enough about the inner workings of rerun.
But maybe there is a third if there was a datatframe entity type? Or is that against the design principles?
Describe the solution you'd like I'd like to send dataframes (e.g. pandas and/or arrow) at once. They have the same timeline but multiple columns (e.g. time, x, y, z), whereas most often the index is the time either in us, seconds or pd.TimedeltaIndex. Great would be something like:
Describe alternatives you've considered Sending each column in separate calls. This works but might generate more overhead then necessary.