Proposal: Metadata Cache

hiltontj commented 1 week ago

Context

It is very common for influxdb users to want to quickly return the unique values for a given column or to return the unique values for a given column given the value of some other column. For example, the unique region names, or unique hostnames, or given the region ‘us-west’ the unique hostnames within that region.

These kinds of lookups are typically used in selectors at the top of dashboards and are frequently accessed. Performing these lookups in tens of milliseconds, rather than hundreds of milliseconds represents a significant improvement in user experience.

The metadata cache will provide a feature to make such queries fast.

Requirements

Users should be able to specify a table and one or more columns to cache.
If given more than one column, the order is important and hierarchical (like the region, host example).
The API to read from the cache is accessible through the query interface, i.e., via queries through a FlightSQL connection or through the /api/v3/query_sql API, to a user-defined function, similar to the last_cache.
The API to configure the metadata cache can be accessed through the REST API.
There is a system table that summarizes the caches configured for a database.
Eviction of entries from the cache is configurable (see below).

Querying the `meta_cache`

Queries to the metadata cache will be fulfilled via a user-defined function called meta_cache and invoked like so:

SELECT host FROM meta_cache('cpu')

Here, host is a column on the cpu table. This query will return the distinct/unique values that have been seen for the host column in incoming writes to the database.

The entries returned should be sorted in ascending order by default.

If a cache is configured for multiple columns, one could select from both:

SELECT region, host FROM meta_cache('cpu')

Queries to the cache support LIMIT and WHERE clauses, e.g.,

SELECT host FROM meta_cache('cpu') LIMIT 10
SELECT host FROM meta_cache('cpu') WHERE region = 'us-west'

In the latter example, the cache would need to be configured on both the region and host columns.

Each cache has a unique name, such that if there are multiple caches configured on a given table, they can be queried by their name:

SELECT host FROM meta_cache('cpu', '<cache_name>')

System Table

There will be a system table that can be queried via:

SELECT * FROM system.meta_caches WHERE table = 'cpu'

The table predicate is required. This will return results with columns for:

table the cache's target table name
name the name of the cache
column_ids as a list of ColumnIds, that the cache is configured on
column_names as a list of the column's names, that the cache is configured on
max_age or null if it is not configured for the cache

Configuration API

Metadata caches can be created via the following API:

POST /api/v3/configure/meta_cache

with body parameters:

db: database the cache is configured on
table: table the cache is configured on
name: (optional) name of the cache, must be unique within the db/table, will be generated if not provided
columns: list of ordered columns to construct the cache
max_age: (optional) values that have not been seen for this amount of time will be evicted from the cache

and deleted via:

DELETE /api/v3/configure/meta_cache

with body or URL parameters:

db: database the cache is configured on
table: table the cache is configured on
name: the name of the cache

Hierarchical Cache Structure

The metadata cache, like the last-n-value cache, should have a hierarchical structure. For example, for a cache configured on two columns, region and host, in that order, the values will be stored in a hierarchy:

Screenshot 2024-11-12 at 14 03 38

Eviction

Entries in the cache will be evicted after they have reached the max_age configured on the cache. If the cache has a configured max_age, an eviction process will need to be run to prune entries that are older than the max_age.

Limitations

Cache Population

The cache is populated strictly by incoming writes. So, values that were written to the database before the cache was created - or in the event of a restart, values that were written to the database before the restart - will not be cached.

There are plans to have caches be pre-populated on cache creation and server restart in the Pro version of InfluxDB 3.

Questions

Can / How do we support OFFSET clauses when the contents of the cache could change between queries?
Do we restrict columns cached to tag columns?
Allow configuring a max_size for a cache? and if so, is it size in memory usage, or number of elements.

pauldix commented 1 week ago

All of that looks good. Some thoughts on the questions:

For the offset, we could use the last value in the returned set. The drawback is that we'd end up having to scan through the list to pick up where we left off, but maybe that's good enough
No restriction on tags vs. other types of columns. They should be able to do this for any string column
We should have a max_cardinality for the cache. We could default this to 100,000 and then let the user set it to whatever they want

hiltontj commented 1 week ago

For the offset, we could use the last value in the returned set. The drawback is that we'd end up having to scan through the list to pick up where we left off, but maybe that's good enough

I think doing a scan would be acceptable for the size that the metacache will grow to, for now; I will try to think of ways to avoid doing a scan while implementing, but I think supporting pagination will be important to this cache.

We could potentially bake the offset into the meta_cache function as an argument. I need to explore the possible syntax for that though.

No restriction on tags vs. other types of columns. They should be able to do this for any string column

Ah, shall we allow for int, uint, and bool cols then, like in the last cache?

We should have a max_cardinality for the cache. We could default this to 100,000 and then let the user set it to whatever they want

Makes sense.

praveen-influx commented 1 week ago

If a column gets added to a table after the metadata cache is created, do we have some mechanism to add that new column to the cache? POST endpoint looks like it's for initial creation only(could be wrong here), so wondered how we allow later additions of new columns.

hiltontj commented 1 week ago

@praveen-influx good call out - in the original doc outlining the cache, we did no plan to allow updates to existing caches, so in that case, it would need to be deleted and re-added with the desired columns/their order.

pauldix commented 1 week ago

We could allow for int, uint, and bool but I don't know that we need to bother with it at this point. Only if it's easy enough. The real use case here is for distinct strings.

hiltontj commented 1 week ago

The real use case here is for distinct strings

The thing I was thinking of would be if they are building some dashboard and are using bool fields as filters/checkboxes. I feel as though there is less of a case for integers. I will start it off with strings for now; it is not hard to flip a struct to an enum if we decide to support more.

MaduMitha-Ravi commented 1 week ago

Can we have a default value configured for max_age instead of null, this way we will have more control on the performance. @hiltontj

pauldix commented 1 week ago

Default to 24h in that case

hiltontj commented 1 week ago

Can we have a default value configured for max_age instead of null, this way we will have more control on the performance. @hiltontj

~For OSS this could be 72 hours, in pro it may be longer, but I think defaulting it makes sense.~ Ah, sorry, I did not see Paul's comment as I needed to refresh my page.

philjb commented 1 week ago

I came here to say that you probably want some limits, total cardinality, column cardinality, tree depth, number of columns etc but I see Paul has asked for a total cardinality limit already.

If I cache all columns and values and don't set an age, I believe the metadata cache here nearly (re)implements the "index" that influxdb tsm uses for tags and fields, which has been a source of slowdown in the db when the cardinality is (very) high. For tsm, this metadata index is consulted for almost every query (hence the major impact when it is high cardinality). It appears the monolith metadata cache won't be consulted for general queries (as datafusion doesn't need it) and it'll be only consulted when specifically queried FROM meta_cache - but something to keep in mind if customers rely on it to accelerate general queries and/or the metadata cache is integrated into certain queries internally. I believe you'll want to avoid the unbounded cardinality that scheme on write permits - that customers have to choose what goes into the metadata cache will do most of the work of preventing the cache from being a performance drag instead of a performance accelerator.

hiltontj commented 1 week ago

@philjb

I came here to say that you probably want some limits, total cardinality, column cardinality, tree depth, number of columns etc but I see Paul has asked for a total cardinality limit already.

Right, for now we will have the total cardinality limit. Tree depth and number of columns in this case should be the same thing, and are limited at the time the cache is created, since the columns used are specified up front, and I don't think we aim to have newly added columns added to existing caches (they would have to re-create the cache to get a different column set).

Having a per-column cardinality is one thing I didn't consider that could also be added at some point.

It appears the monolith metadata cache won't be consulted for general queries

Right, users need to explicitly query it with SELECT ... FROM meta_cache('table').

praveen-influx commented 1 week ago

we did no plan to allow updates to existing caches, so in that case, it would need to be deleted and re-added with the desired columns/their order.

Will creating new meta_cache require querying data for each column to build the cache again? Not sure how quick that'd be if we need to go to object store for it. That would in turn probably mean downtime for the user.

I wonder if it's worth exploring concepts like aliasing (if it's not been done already) such that new cache is created in the background whilst old cache is still accessible via an alias. Then once the new cache is available we swap the alias to point to the new one in the background. It is not something to be addressed as part of this issue by the way.

hiltontj commented 1 week ago

Will creating new meta_cache require querying data for each column to build the cache again? Not sure how quick that'd be if we need to go to object store for it. That would in turn probably mean downtime for the user.

Yes. The intention was to not do cache population from object store in OSS, but do so in Pro.

This does highlight a bit of a usability issue with the caches. Part of the issue is that users need to write to their tables in order to create them (vs. having the ability to explicitly set up their schema ahead of writing to the database). Since the cache can only be created for an existing table, if the cache is not pre-populating, then it will miss out on the data that was written before the cache was created.

I wonder if it's worth exploring concepts like aliasing

This might require that we assign IDs to caches.

pauldix commented 1 week ago

FWIW, we will want to create an API to create a table where the user can specify schema, last caches, meta caches, and for Pro, which columns get put into the file index.

alamb commented 1 week ago

I don't fully follow the answer above about how values will be populated in the cache. Maybe it helps to ask about what the user experience would be.

When I make a metadata cache, what values will be present?

If the system crashes and restarts, will the values in the cache be immediately available after reboot?
If a cache is created for a column that already has lots of data (e.g. multiple years) will it be populated by loading existing values? Is there any way I can do so?

pauldix commented 6 days ago

The cache is filled by writes. So when you create it, it's empty. When you reboot, it starts empty.

alamb commented 6 days ago

The cache is filled by writes. So when you create it, it's empty. When you reboot, it starts empty.

I recommend putting that in the description of the feature as I think it is something users will want to know

hiltontj commented 5 days ago

I recommend putting that in the description of the feature as I think it is something users will want to know

For now, I added a Limitations section to the main issue description that mentions this detail.

influxdata / influxdb