rerun-io / rerun

Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
https://rerun.io/
Apache License 2.0
6.71k stars 341 forks source link

Rethinking stateful time APIs to avoid surprising behaviors #3743

Open teh-cmc opened 1 year ago

teh-cmc commented 1 year ago

TL;DR: Introduce per-recordingstream clocks, as opposed to per-thread clocks.

Status quo

Logging to a recording is controlled by RecordingStreams.

RecordingStream are Send & Sync, i.e. they can be shallow-cloned (refcounted) and sent to other threads where they can then be used concurrently. re_sdk provides facilities to help keep track of per-process and per-thread RecordingStreams. Those facilities are mostly used by the Python bindings, though nothing prevents using them in Rust & C++ too.

Independently of recording streams, we also have per-thread clocks. These clocks keep track of one time per-Timeline per-RecordingStreamId per-ThreadId: i.e. something along the lines of HashMap<(ThreadId, RecordingStreamId, Timeline), Time>.

These times can be set in a stateful manner using the RecordingStream::set_time_* family of methods. When it's time to log something, a recording stream will query the clock in thread-local storage and ask for the time on all timelines for the current (ThreadId, RecordingStreamId) pair.

Problems

This implicit relationship between threads and RecordingStreams can lead to confusion and/or surprising behavior.

Example:

  1. You call rec.set_time_sequence("frame", 42) on your main thread.
  2. You send a bunch of data as well as a copy of rec to another thread where the data gets pre-processed then logged to rec.
  3. You get confused: the data doesn't have a "frame" timestamp associated with it (the thread-local clock on your post-processing thread doesn't know about it!).

You can imagine all kinds of scenarios like the above. The root cause is always the same though: thread-local clocks vs. recording streams vs. the interaction between those.

In Rust & C++, you generally always work directly with RecordingStream handles of some kind, so at least the confusion stops there.

But in Python, it can actually gets worse:

Finally, each RecordingStream keeps track of its internal, process-global tick: i.e. the log_tick timeline is always global no matter what!

Proposal

My proposal would be to get rid of thread-local clocks altogether, and instead put a clock in each and every RecordingStream.

Furthermore, clocks would not be refcounted:

#[derive(Clone)]
pub struct RecordingStream {
    clock: Clock,
    inner: Arc<Option<RecordingStreamInner>>,
}

That way, cloning a RecordingStream would still be relatively cheap (and more importantly, wouldn't clone the underlying batching and I/O pipelines) but would now behave more like a fork: the clone recording-stream just starts where its parent state left off.

jleibs commented 1 year ago

This proposal makes sense to me. Something else worth considering: we eventually want to provide an API in which users can explicitly control the timepoint associated with their data instead of grabbing it from the stream context.

Pretty sure this proposal moves us in a direction that helps with that goal. (Probably a stetch, but you could even imagine implementing it as a special way of creating a RecordingStream with a static clock). Either way, just figured it would be helpful to keep in mind while implementing this.

teh-cmc commented 1 year ago

This proposal makes sense to me. Something else worth considering: we eventually want to provide an API in which users can explicitly control the timepoint associated with their data instead of grabbing it from the stream context.

Pretty sure this proposal moves us in a direction that helps with that goal. (Probably a stetch, but you could even imagine implementing it as a special way of creating a RecordingStream with a static clock). Either way, just figured it would be helpful to keep in mind while implementing this.