rerun-io / rerun

Visualize streams of multimodal data. Fast, easy to use, and simple to integrate. Built in Rust using egui.
https://rerun.io/
Apache License 2.0
6.24k stars 290 forks source link

Send dataframe API #7204

Open Famok opened 1 month ago

Famok commented 1 month ago

Describe the solution you'd like I'd like to send dataframes (e.g. pandas and/or arrow) at once. They have the same timeline but multiple columns (e.g. time, x, y, z), whereas most often the index is the time either in us, seconds or pd.TimedeltaIndex. Great would be something like:

send_dataframe( base_entity_path = 'mydataframe',
             timeline = 'mytimeline',
             data = df, 
             time_column:Union[None,str]= 'index',  # None would always select the index
             columns:Union[None, List[str]] = ['x','y']                 # None would select all columns
            ) 

Describe alternatives you've considered Sending each column in separate calls. This works but might generate more overhead then necessary.

abey79 commented 1 month ago

If I understand correctly, your proposed API would result in the following data being logged:

both on the mytimeline timeline.

Is that correct?

In general, having a dataframe-based API is very good fit for our new columnar stuff. I see at least two points here:

Famok commented 4 weeks ago

Creating subentities seems to be rhe easiest way.

I can't see how the second option would work, I don't know enough about the inner workings of rerun.

But maybe there is a third if there was a datatframe entity type? Or is that against the design principles?