mozilla-services / syncstorage-rs

Sync Storage server in Rust
Mozilla Public License 2.0
863 stars 46 forks source link

Investigate Cloud Spanner's managed TTL feature #1577

Open data-sync-user opened 1 month ago

data-sync-user commented 1 month ago

Syncstorage supports automatic deletion of expired records (per the TTL field on bso records) via purge_ttl.py. This script runs as a background task, nightly during periods of low activity of the database.

Such a background script was the only real option for TTLs when we first switched to Cloud Spanner in 2020 and we expressed the desire for a Spanner native TTL feature with the Google Cloud Team to replace it with. They then added support for native TTLs late 2021: https://cloud.google.com/blog/products/spanner/reduce-costs-and-simplify-compliance-with-managed-ttl-in-spanner

Spanner’s managed TTL support could potentially reduce costs as it provides TTL support “for free” without the need to run (nor maintain) the background script.

Distributed databases like Spanner often have a background “compaction” process that actually removes previously deleted data from its storage (oftentimes deletes are more like writes that “queue” deletion for that later process). Such databases, if offering native TTL support, tend to implement it as a part of that compaction process. This ends up being much more efficient than a manually ran background script like purge_ttl that scans for expired records. The reduced Spanner CPU incurred from our script could potentially help our costs in the future (especially when we switch to Spanner’s auto scaler).

We also incur a significant amount of extra storage for indices needed by purge_ttl (though it's not clear if they're even 100% necessary for the script): managed TTL support very likely completely negates the need for these extra indices, potentially resulting in cost savings.

Let’s take a close look at the managed TTL support and, assuming it can accommodate our needs, formulate a high level plan for how we’ll implement its use in syncstorage and how we’ll migrate over to it.

┆Issue is synchronized with this Jira Task