Document e2e logging performance for time series data

nikolausWest commented 5 months ago

We want to benchmark logging scalars, including setting a timeline value for each logged scalar, i.e. something like

for frame_nr in range(0, 1_000_000) {
    rr. set_time_sequence("frame", frame_nr)
    rr.log("scalar", rr.TimeSeriesScalar(sin(frame_nr / 1000.0)))
}

We have the tool for it:

just rs-plot-dashboard --num-plots 10 --num-series-per-plot 5 --num-points-per-series 5000 --freq 1000

For each language (C++, Python, Rust), measure the max throughputs (scalars per second), end-to-end (logging -> visualization) for single-threaded/single-plot and multi-threaded logging (so 3 x 2 throughput figures).

We also want to check the memory use in the viewer when we have logged 100M scalars or so, to measure the RAM overhead.

manually document this somewhere in our docs, i.e.:

On a 2023 MacBook M1:

Language	Single-threaded	Multi-threaded
C++	? kHz	? kHz
Python	? kHz	? kHz
Rust	? kHz	? kHz

Viewing 100M scalars use up ?GB of RAM in the native viewer.

Very rough numbers is fine, e.g. "~10 M scalars / second"

emilk commented 5 months ago

We should link to https://github.com/rerun-io/rerun/issues/4423 too

emilk commented 5 months ago

I know there was some decision to punt on this (and it was moved to Triage), so I'm moving this down in urgency.

It would be nice with a short comment explaining why we are punting on this though.

rerun-io / rerun

Document e2e logging performance for time series data #4889