Introducing `ScalarChart` in place of `BarChart`?

teh-cmc commented 5 months ago

Context

We have the Scalar archetype that makes it possible to log and visualize scalar timeseries (in the most literal sense: the X axis is literally the recording's clock):

for step in range(0, 64):
    rr.set_time_sequence("step", step)
    rr.log("scalar", rr.Scalar(math.sin(step / 10.0)))

This is good in that this allows users to just log their scalar data as it comes without having to manually keep track of any kind of state.

This is bad because it means each scalar has to be its own DataRow (1 row == 1 timepoint), which leads to performance issues if you just want to log a huge timeseries for which you already have all the data needed in one place.

These performance issues come in two forms:

On the client-side, logging will be very slow due to the cost of crafting and serializing DataRows for every scalar. We do have a long-term plan that would allow users to log of "temporal batches", i.e. multiple timestamps worth of data in a single log call. But A) it will be a while before this is implemented and B) it doesn't solve the second form of performance issue, discussed below.
On the viewer-side, ingestion will be very slow due to the cost of having to index many DataRows. Indexing a row is a costly operation: it not only has to run all the datastore logic (indexing all the individual cells etc), but it also triggers a chain of events that need to propagate and update all downstream subscribers (datastore views, time panel, heuristics, clear cascades...). AFAICT, batching row ingestion is much harder problem that batching on the logging side. And even if we make it past the ingestion, rendering a time panel with a few million entries is probably still no cheap task (?).

Interestingly, we also have the BarChart archetype, which already has the nice property of accepting a batch of scalars all at once:

rr.log("bar_chart", rr.BarChart([8, 4, 0, 9, 1, 4, 1, 6, 9, 0]))

The one downside is that this doesn't integrate at all with the time cursor, since the barchart as a whole is its own entity.

But, for many cases, this can still be a very useful tool in real world scenarios, especially when combined with timeless/static.

Proposal

Retire the BarChart archetype in favor of a new generic ScalarChart archetype:

table ScalarChart {
  /// The values. Should always be a rank-1 tensor.
  values: rerun.components.TensorData ("attr.rerun.component_required", order: 1000);

  /// The optional indices. Should always be a rank-1 tensor with the same length as `values`.
  ///
  /// Defaults to `range(0, len(values))` if unspecified.
  indices: rerun.components.TensorData ("attr.rerun.component_optional", nullable, order: 3000);
}

rr.log("scalar_chart", rr.ScalarChart([1, 2, 3], indices=[30, 20, 10]))

As for styling, ScalarChart would re-use the same styling archetypes as Scalar: SeriesLine to visualize the data as a line, and SeriesPoint to visualize it as a scatter plot.

We would also introduce a new SeriesBar style:

/// Define the style properties for a bar series in a chart.
///
/// This archetype only provides styling information and should be logged as static
/// when possible. The underlying data needs to be logged to the same entity-path using
/// the `ScalarChart` archetype.
table SeriesBar {
    /// Color for the corresponding series.
    color: rerun.components.Color ("attr.rerun.component_optional", nullable, order: 1000);

    /// Bar width for the corresponding series.
    width: rerun.components.StrokeWidth ("attr.rerun.component_optional", nullable, order: 2000);

    /// Display name of the series.
    ///
    /// Used in the legend.
    name: rerun.components.Name ("attr.rerun.component_optional", nullable, order: 3000);
}

That new style would also retroactively work with the Scalar archetype.

All in all, this would improve the existing Scalar type by making it possible to visualize the data as a bar chart, and would allow users to work with very large series using the new ScalarChart.

Of course this has the same downside as the original BarChart archetype: it doesn't integrate with the time cursor.

As part of this work, we would also use this opportunity to share a lot more code between Scalar and ScalarChart, so that ScalarChart can benefit from all the recent improvements to the plot view (range caching, subpixel aggregation, etc).

Random thoughts

Unrelated to any of the above: maybe we should still allow batches of vanilla Scalars, if only so that people can at least batch their vanilla scalar data when they know they have more than a single value for a given timestamp? Sounds niche, but it is something that happens in the e.g. the VRS example :shrug:

jleibs commented 5 months ago

Related to the last random thought: Generic ScalarBatch is pretty common way of thinking about and representing large state-space control-systems. However, in that case the user would still want to be able to plot a specific Index sub-selection in a given plot.

Better yet, providing a mapping of those scalar-indices into the entity tree would provide the performance of single-timestamp batch signal-logging with convenient entity-path names.

For example:

# Signals is a scalar array with signals.shape = (18,)
rr.log("/signals/raw", ScalarBatch(signals))

# Somehow provide an API to remap:
/signals/imu/accel_x := /signals/raw[7]
/signals/imu/accel_y := /signals/raw[8]
/signals/imu/accel_z := /signals/raw[9]

Famok commented 4 months ago

Thanks for pointing me here @teh-cmc . It would be great to be able to add a time and value vector at the same time. I have unevenly sampled data (dt is not the same) and it seems to me, that the bar chart has even steps (like 0,1,2,3)

rerun-io / rerun