influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
29.02k stars 3.56k forks source link

Handle projection pushdown in the metadata cache #25584

Open hiltontj opened 3 days ago

hiltontj commented 3 days ago

Problem

The TableProvider implementation for the MetaCacheFunctionProvider is not currently handling projection pushdown: https://github.com/influxdata/influxdb/blob/20d09a8dda5ac42a6cb388b15edc169f9fbbd709/influxdb3_cache/src/meta_cache/table_function.rs#L46

This means that the cache will be getting a full scan (within the bounds of provided predicates) regardless of the provided projection. For a cache that has multiple levels, if the user is only interested in the top level of the cache, this could lead to unnecessary cycles spent scanning lower levels of the cache; if the user is interested in lower levels of the cache, then we still need to scan through the higher levels, but at the least, we could avoid building the arrow buffers for those columns.

In addition, projection to lower levels of the cache is not ordered, however, that may need a separate issue.

Proposed solution

The projection provided to the TableProvider::scan could be passed down to the MetaCache::to_record_batch to more optimally scan the cache:

Alternatives

N/A

Additional context

Currently, DataFusion handles projection at a higher level, so this isn't a show-stopper, the cache will still work as it is intended when projections are provided in the query.

The method that walks the cache hierarchy to do predicate evaluation and build the arrow buffers is here.

An example showing that the output when projecting a lower column is not ordered is here.