This adds the MetaCacheProvider for managing metadata caches in the influxdb3 instance. This includes APIs to create caches through the WAL as well as from a catalog on initialization, to write data into the managed caches, and to query data out of them.
The query side is fairly involved, relying on Datafusion's TableFunctionImpl and TableProvider traits to make querying the cache using a user-defined table function (UDTF) possible.
The predicate code was modified to only support two kinds of predicates: IN and NOT IN, which simplifies the code, and maps nicely with the DataFusion LiteralGuarantee which we leverage to derive the predicates from the incoming queries.
MetaCacheExec, a custom ExecutionPlan implementation was added specifically for the metadata cache that can report the predicates that are pushed down to the cache during query planning/execution.
A big set of tests was added to to check that queries are working, and that predicates are being pushed down properly.
Additional Notes
This PR moved the code in the meta_cache module of the influxdb3_cache crate around so that the primary mod.rs just contains tests, and then has the following additional modules:
cache.rs: (existing with some changes) core cache implementation - I left a couple comments on where the main changes were made to that code
provider.rs: (new) contains code for the new MetaCacheProvider
table_function.rs: (new) contains code for DataFusion trait implementations
The feature still needs to be accessible through the API, so that caches can be created, deleted, and viewed, but that will be done in follow-on issues as part of https://github.com/influxdata/influxdb/issues/25539.
@praveen-influx - on further thought, I have made some forward progress on subsequent issues so I want to get this merged. I opened issues to address the feedback you gave:
Closes #25543 Closes #25544
This adds the
MetaCacheProvider
for managing metadata caches in theinfluxdb3
instance. This includes APIs to create caches through the WAL as well as from a catalog on initialization, to write data into the managed caches, and to query data out of them.The query side is fairly involved, relying on Datafusion's
TableFunctionImpl
andTableProvider
traits to make querying the cache using a user-defined table function (UDTF) possible.The predicate code was modified to only support two kinds of predicates: IN and NOT IN, which simplifies the code, and maps nicely with the DataFusion
LiteralGuarantee
which we leverage to derive the predicates from the incoming queries.MetaCacheExec
, a customExecutionPlan
implementation was added specifically for the metadata cache that can report the predicates that are pushed down to the cache during query planning/execution.A big set of tests was added to to check that queries are working, and that predicates are being pushed down properly.
Additional Notes
meta_cache
module of theinfluxdb3_cache
crate around so that the primarymod.rs
just contains tests, and then has the following additional modules:cache.rs
: (existing with some changes) core cache implementation - I left a couple comments on where the main changes were made to that codeprovider.rs
: (new) contains code for the newMetaCacheProvider
table_function.rs
: (new) contains code for DataFusion trait implementations