rerun-io / rerun

Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.
https://rerun.io/
Apache License 2.0
7.16k stars 359 forks source link

Caching tiers, memory overhead and garbage collection #5974

Open teh-cmc opened 7 months ago

teh-cmc commented 7 months ago

This is a living document that contains a brain dump regarding the state of caching in the viewer, some of the major issues on that front, some thoughts for the future, etc.

Mostly so I can organize my thoughts and have something to point to rather than writing book-length comments in the code again and again.



State of caching

(I'm going to assume that promises have been implemented already for the remainder of this document)

The viewer's data stack is currently organized as a series of caches sitting on top of each other. Pulling on the top-most caching layer should implicitely trigger a passthrough chain reaction that updates all the layers, if and where required.

Every layer specifies 3 behaviors:

  1. how it gets filled.
  2. how it gets invalidated, i.e. how does it react to data semantics changing in the layers below it.
  3. how it gets garbage collected, i.e. how does it react to memory pressure.

The cache stack, from bottom to top, looks like this:

  1. .rrd file (raw data)
  2. Datastore
  3. Promise cache
  4. Primary cache
  5. Secondary caches
  6. Ad-hoc caches (JPEG, etc)

0. .rrd file (raw data)

The raw data, as it sits on disk storage. The only thing that's not a cache, at present.

Some entries in that file might actually be promises, which are either cheap datatype conversions or full-blown promises, in which case the raw data is just a reference to the real raw data, which lives somewhere else.

1. Datastore

Raw data from an rrd file gets loaded into the Datastore, which can be seen as a cached/denormalized index of an rrd file.

The index adds some fixed overhead for each entry in the rrd file, which can range from irrelevant to extremely problematic depending on the kind of data we're talking about.

Fill

The store gets filled by pushing data into it in a streaming fashion. It gets no say in what should or shouldn't be indexed -- it just blindly ingests everything.

Invalidation

There is no invalidation involved: the store is always up-to-date since the raw rrd data is immutable/append-only.

Garbage collection

Store data gets evicted in RowId order (!) by the garbage collector, triggered based on the resident memory size of the entire app. Using resident memory pressure as a trigger can lead to nasty problems e.g. if another part of the app ends up leaking memory, or simply if another caching layer happens to be hogging memory.

Once data gets evicted from the store, it's gone. The only way to get it back is to re-log it, e.g. by reloading the source rrd file, if possible.

The datastore fights for memory resources against all the layers of caching above it. Those caches can recompute their garbage collected data back in exchange for a compute cost: the datastore cannot.

2. Promise cache

The promise cache keeps track of the data that resulted from resolving a promise.

Fill

The cache is filled by resolving promises, which is currently a just-in-time process that happens when a space view (or any other system) decodes the results of a query.

Invalidation

There's nothing to invalidate: promises are identified by their unique, immutable RowId, therefore the resolved data is assumed to always be up-to-date (i.e. the viewer assumes that promise resolution is a pure process: promise resolution is expected to always return the same result in the successful case).

Garbage collection

We don't have a GC strategy in place for promises yet, but the promise cache will have to respond to memory pressure, one way or another.

Since we assume that promise resolution is pure, it's always fine to garbage collect resolved data -- we can always resolve it again (as long as the raw promise itself doesn't get GC'd from the datastore!).

Promises are in an interesting spot:

  • on one hand they are likely to be very costly and very slow to resolve, so there's a lot of value in having their resolved data cached around.
  • on the other hand, GCing resolved data does not result in any data loss, and so at the very least the promise cache should never take precedence over the datastore... maybe?

3. Primary cache

The primary cache is composed of two parts: a query cache and a deserialization cache.

The query cache's role is, unsurprisingly, to keep track of the raw results of executing a query. In practice, that means: for a given entity path, components and query time, keep track of the resulting data time and associated data cells. The query cache does not copy any data, it's all pointers (buried in datacells).

There are two situations where the query cache comes in handy:

The second part of this, the deserialization cache, is a horrible hack. Its role is to overcome the overhead of unpeeling lots of small data cells, and lay out the data in a way that's optimal for the CPU cache. Somehow we need to get rid of it, but right now it's what makes the time series view's performance somewhat bearable.

The primary cache effectively doubles the memory overhead of the data, since it exists both as arrow data sitting in a cell in the store (or in the promise cache, same difference), as well as native data in the deserialization cache. In practice this generally matters very little, for two reasons:

Still, doubling the memory overhead of large point clouds definitely sucks.

Fill

The query cache is filled by running queries through the cached APIs, which happens both in space views and all over the UI code because of our immediate mode architecture. It's a completely untyped process since it's just shuffling cells around.

The deserialization cache is filled when iterating over the query results, at which point the data is converted to a native type. Note: what gets deserialized and cached is the resolved data, which is why the primary cache lives on top of the promise cache.

Invalidation

The primary cache subscribes to the datastore's changelog for invalidation purposes.

Invalidation of range queries is pretty coarse, and can be quite costly, compute-wise.

Garbage collection

There's no garbage collection as of yet. The primary cache just grows indefinitely as more and more queries are run.

It tends to not be an issue in practice (yet) because most recordings hit the memory limit from the indexing overhead before the primary caching overhead can ever become an issue.

If the deserialization cache were to go away, the memory overhead of the primary cache would be minimal, although still not insignificant, especially for something like scalars.

Similar to the promise cache, the primary cache should probably never hogs memory that could be used to extend the lifetime of the data in the datastore instead.

4. Secondary caches

The general idea behind secondary caches is that they store arbitrary user-data (where the user in this case is a system, whether a space view or otherwise), and automatically benefit from the invalidation logic of the primary cache.

We unfortunately don't have any secondary caches up and running yet, but it's easy to think of scenarios where they would fit nicely:

Fill

Secondary caches are expected to be filled by the end system itself, right at the edge, since it (the system) is the only entity that knows about the transformation the data should go through.

For CPU-based artifact, the memory is handled by the cache subsystem as usual. For GPU-based artifact, we're probably going to have to find a way to centralize VRAM management eventually... As a starting point, we can just let the end system deal with it, like we already do today.

Invalidation

Invalidation of secondary caches is driven by the primary cache (itself subscribed to the datastore changelog).

The system doing the filling doesn't need to run any extra invalidation logic of its own, which is nice.

Garbage collection

That one needs to be figured out.

Secondary caches are interesting in that aspect for several reasons:

Secondary caches are expected to hold data that is both very costly to compute (and upload, in the GPU case) and takes a lot of space. A secondary cache hit is a real blessing though -- that pretty much cancels all the work for the current frame.

Although, again, care has to be taken in order to make sure that secondary caches don't hog precious CPU memory that could extend the lifetime of the raw data. GPU memory is uncontested though, which is nice.

5. Ad-hoc caches (JPEG, Encoded Meshes)

These are ad-hoc secondary caches that predate the introduction of generalized caching throughout the app.

They are meant to be replaced by proper secondary caches... at some point.

Fill

I think they get filled just-in-time by the image and mesh views, respectively.

Invalidation & Garbage collection

There is no invalidation logic per se, but the cache keys do include the RowId of the data.

The cache also keeps track of which cache keys were used during the last frame, and clears out everything that hasn't been used for a couple frames if the cache's memory limit has been reached. I think.

Garbage collecting the caches

Clearly we will need to come up with a way of garbage collecting the different caching layers, decorrelated from the invalidation process.

The difficulty there is to come up with a precedence/priority algorithm that makes sense for most use cases, given the tension caused by the different layers all sharing the same memory resources.

Layer Mem. usage Can be recomputed Recompute cost
Datastore High No N/A
Promise cache Medium Yes Very high
Primary cache Low Yes Low
Secondary caches High Yes High

It also isn't clear that one should always prioritize giving memory to the datastore. E.g. in real-time scenarios where one is only interested in the last few frames of data, the user would prefer dedicating more memory to the secondary caches rather than keeping more, older datastore entries alive. Maybe this can be solved by exposing more configuration knobs for power users... but we know how that generally turns out.

The intermediary promise and primary caches effectively become dead weight once the views have computed their secondary caches accordingly, assuming they have any. There are a lot of gotchas with that train of thought, though.

Still, it does kind of seem at first glance the caches on the edge, whether it's the store at the bottom or the secondary caches at the top, should take priority over intermediate ones in most cases.

What about multi-recording & multi-app scenarios?

Something we haven't really spent any time on until now is the behavior of garbage collection in multi-recording and multi-app scenarios.

I don't even remember what we do today. I think we always start with the oldest recording, irrelevant of which app it belongs too.

That seems good enough for a while, although it'd be nice to document this somewhere.

teh-cmc commented 7 months ago

Possible approach:

  1. Always organize a frame in that order: ingest, GC, querying & caching.
  2. Always trigger GC based on resident memory size.
  3. Always GC layers in that order, stopping as soon as the desired amount of memory has been reclaimed:
    • Primary cache
    • Secondary caches
    • Promise cache
    • Datastore (data is gone for good)

One problem is that the querying & caching step of the frame could completely blow the memory budget and OOM the entire viewer before the GC has any chance of running again (which won't happen until next frame). The reason is that each view that runs inflates the size of the different caches. One solution could be to try and trigger the GC in-between each space view invocation, thus greatly limiting the risk of OOMs, and generally making it much more realistic to keep working in the face of very high memory pressure. Of course at some point you're still at the mercy of the user, we can't prevent the user from logging a promise that resolves to 20GiB of data. That's fine, it's all best effort.

Another problem is the order of garbage collection within each cache. For the datastore (and all derived indices), we've already settled on ascending RowId-order. Maybe we need something equivalent for the different caching layers? or maybe in that case some kind of LRU makes more sense?

Any order that's not entirely RowId-based comes with its own biases to solve though: which entity paths should be GC'd first? what about timelines?