losfair / mvsqlite

Distributed, MVCC SQLite that runs on FoundationDB.
https://github.com/losfair/mvsqlite/wiki
Apache License 2.0
1.35k stars 38 forks source link

Design: Consistent mesh for page-level mutation logs #96

Open losfair opened 1 year ago

losfair commented 1 year ago

Currently the cache invalidation logic depends on FoundationDB. After each call to /batch/commit, we store the set of pages mutated by the transaction into a special keyspace in FDB, keyed by the commit versionstamp. When a later request to /stat comes in, keys in the range (client_last_known_commit_version, current_read_version) are scanned and returned to the client so that they can invalidate their caches.

This results in a lot of temporary data that is written once and read a few times. It is a waste to let these data go through the entire FDB transaction & storage systems, and a better design is possible.

losfair commented 1 year ago

The design is to form a mesh network of all mvstore instances so that they can synchronize cache invalidation logs directly and keep everything in-memory. Metadata is still backed by FDB to keep things consistent.

fire commented 1 year ago

Do you need any help with this?

losfair commented 1 year ago

@fire I haven't been able to get back to this yet... The idea I have in mind is that, we select one mvstore process in the cluster to be the "write proxy" and let all mutations go through the proxy. The proxy keeps an interval of mutated page numbers in memory (10min?). Readers can then query the proxy and invalidate their caches accordingly. In case the write proxy is not able to provide this information (for example if a new proxy is elected), the reader invalidates its entire cache.

Election of the write proxy happens through FDB. The FDB read version can be used to fence between write proxy generations. How to scale the role is still an open question though (maybe shard it like what FDB does for itself?)