influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
29k stars 3.56k forks source link

Populate the parquet cache after server restart #25535

Closed hiltontj closed 1 week ago

hiltontj commented 1 week ago

Currently, the parquet cache is only populated at the time a parquet file is persisted. Therefore, in the event of a server restart, recent parquet files will not be cached, and only newly written parquet files created after the restart will be cached.

There should be a way to pre-populate the cache on server start, when loading the recent snapshot files.

There should be some configured limits on this, e.g.,

If this would be too expensive with the object store, then we could consider modifying the cache to only cache files as needed, i.e., when they are requested.