hansetag / iceberg-catalog

A Rust implementation of the Iceberg REST Catalog specification.
Apache License 2.0
144 stars 9 forks source link

Add Caching #6

Open c-thiel opened 4 months ago

c-thiel commented 4 months ago

Currently we issue a lot of DB requests. In many places, we would benefit greatly from Caching. A bit of motivation:

Good places to implement caching serve slowly changing data and are queried often. However, we must always assume that we run in a distributed environment. Thus, we should never cache the full TableMetadata that can change frequently for a table. In contrast, location & uuid can be considered slowly changing.

Some calls that I think should be cached:

I am not sure yet what would be the best place to add caching. Ideally, its not implementation specific, so it shouldn't be part of postgres, so not in this folder: https://github.com/hansetag/iceberg-rest-server/tree/main/crates/iceberg-rest-server/src/implementations/postgres

We also have to make the choice of distributed vs. local caching. I currently prefer local caching due to less dependencies. Postgres and most other DBs cache as well.

twuebi commented 1 month ago

What are your thoughts regarding cache invalidation? Tables may be dropped or staged tables may be overwritten, in both cases the cached name -> uuid mapping becomes stale.

I can see a cache work for the location mapping, here we could simply store the full location based on previous lookups, if this gets stale we still have the same behavior, a 404 upon fetching from pg using the full location. We'd get rid of the prefix based lookup this way.

For all other caches, I only see them working with some fallback logic that retries and cleans the cache executed from the respective handler fn which sounds boilerplatey.