influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.17k stars 3.51k forks source link

feat: last cache implementation #25109

Open hiltontj opened 3 days ago

hiltontj commented 3 days ago

Closes #25093

Initial cut at the last cache implementation in the write buffer. This adds the LastCacheProvider, which is associated with the write buffer, and holds a map for holding the last cache for any given database/table. Data is loaded into the cache at the same time that batches are written to the segment state, i.e., after the WAL has been flushed.

It only supports one cache per table, and is missing some other key features.

Still To-Do

pauldix commented 2 days ago

I also realized while writing out the PR feedback that there's another setting we'll need for the last values cache, which is an age out. Since it'll keep last values caches for each unique key column combination seen, it means that for ephemeral key column values (i.e. ephemeral series), they'll continue taking up space in the cache until they're cleared out.

So that duration for timeout should be an additional parameter on the settings (like count). A sensible default might be 4 hours. So every once in a while, the last value cache should be walked so that any key set that hasn't seen a new value in that time is cleared from the cache completely.

hiltontj commented 2 days ago

@pauldix thank you for the feedback!

I think there might be a little confusion about the desired behavior of the cache. An example might be helpful here and you can tell me if this structure will handle it, or if you're leaving it for later, or if this is clarifying.

My understanding was definitely off, especially w.r.t. the key columns (literally keys in the cache), but your example clears that up for me. I can re-work the implementation to satisfy that behaviour.

hiltontj commented 2 days ago

I also realized while writing out the PR feedback that there's another setting we'll need for the last values cache, which is an age out. [...]

Good call, I can add this to the requirements in the related issues.