influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
29k stars 3.56k forks source link

feat: metadata cache provider and datafusion trait impls #25566

Closed hiltontj closed 5 hours ago

hiltontj commented 3 days ago

Closes #25543 Closes #25544

This adds the MetaCacheProvider for managing metadata caches in the influxdb3 instance. This includes APIs to create caches through the WAL as well as from a catalog on initialization, to write data into the managed caches, and to query data out of them.

The query side is fairly involved, relying on Datafusion's TableFunctionImpl and TableProvider traits to make querying the cache using a user-defined table function (UDTF) possible.

The predicate code was modified to only support two kinds of predicates: IN and NOT IN, which simplifies the code, and maps nicely with the DataFusion LiteralGuarantee which we leverage to derive the predicates from the incoming queries.

MetaCacheExec, a custom ExecutionPlan implementation was added specifically for the metadata cache that can report the predicates that are pushed down to the cache during query planning/execution.

A big set of tests was added to to check that queries are working, and that predicates are being pushed down properly.

Additional Notes

hiltontj commented 5 hours ago

@praveen-influx - on further thought, I have made some forward progress on subsequent issues so I want to get this merged. I opened issues to address the feedback you gave: