Open ezekg opened 5 months ago
If we were to always store account-level data in the US, it would be easier to implement multi-region support for a single account (i.e. intermixed data). Otherwise, we potentially need to query for the account in all regions in order to set the current tenant. And we would also need to assert uniqueness across regions, which would be pretty hard. Everything else, though, aside from the account, can be stored in the account's region by default.
This would also require servers and workers in each given region, to ensure compliance for data processing, but that can be handled via the Keygen-Region
header (again, always US by default) and a location-aware load balancer or router (look at Fastly's offering).
To simplify, we could implement multi-region data storage/residency first (i.e. database only), and full multi-region support later (servers, etc.)
Need to note that any joins on the account, e.g. the pruning jobs, would need to be rewritten since accounts would only exist in the US region.
If we were to intermix, regions would need to be implemented similarly to environments, where any record's belong to associations must also be in the same region. Otherwise we get problems where e.g. machine counts per-license are inaccurate.
Edit: actually, this isn't the case since the parent wouldn't exist since it's in a separate database.
What if we modeled this as Silos? Where each "regionable" (or "siloable") model belongs to a Silo, and a Silo belongs to a Region (either itself modeled or just an opaque string like backend
).
The per-request region switching would make enforcing account-level limits challenging, since resources could span multiple regions i.e. databases. E.g. an account's hard limit on ALUs for the Dev 0 tier would be challenging to enforce. But we could always make it so multi-region support was only available on Std or Ent tiers to work around that.
Workers would need to be region-aware. For example, a license expiration worker would need to look at licenses across all regions, and the worker that waits on an artifact's upload would need to know which region the artifact was created in.
If we only implement multi-region data storage, we also introduce significant latency from US servers to EU databases.
For GDPR compliance, it would be useful to side-step lawyers and data processing agreements entirely by allowing customers to store their data in the EU.
We could do this with a primary-primary database setup. Each account would have a
region
, defined at account creation. And each request would accept aKeygen-Region
header, eitherUS
orEU
, withUS
being the default.Like environments and the
Keygen-Environment
header, theKeygen-Region
header would switch regions, i.e. databases, for the current request.What would be super cool would be to allow data to be intermixed, i.e. an account in the US region having data in both the US and EU regions, for compliance reasons.
See: https://sentry.engineering/blog/3m-dollar-dropdown