Open c-thiel opened 4 months ago
What are your thoughts regarding cache invalidation? Tables may be dropped or staged tables may be overwritten, in both cases the cached name -> uuid mapping becomes stale.
I can see a cache work for the location mapping, here we could simply store the full location based on previous lookups, if this gets stale we still have the same behavior, a 404 upon fetching from pg using the full location. We'd get rid of the prefix based lookup this way.
For all other caches, I only see them working with some fallback logic that retries and cleans the cache executed from the respective handler fn which sounds boilerplatey.
Currently we issue a lot of DB requests. In many places, we would benefit greatly from Caching. A bit of motivation:
Good places to implement caching serve slowly changing data and are queried often. However, we must always assume that we run in a distributed environment. Thus, we should never cache the full
TableMetadata
that can change frequently for a table. In contrast,location
&uuid
can be considered slowly changing.Some calls that I think should be cached:
TableMetadata
object, but only Metadata about theget_table_metadata_by_id
?I am not sure yet what would be the best place to add caching. Ideally, its not implementation specific, so it shouldn't be part of
postgres
, so not in this folder: https://github.com/hansetag/iceberg-rest-server/tree/main/crates/iceberg-rest-server/src/implementations/postgresWe also have to make the choice of distributed vs. local caching. I currently prefer local caching due to less dependencies. Postgres and most other DBs cache as well.