keygen-sh / keygen-api

Keygen is a fair source software licensing and distribution API built with Ruby on Rails. For developers, by developers.
https://keygen.sh
Other
699 stars 40 forks source link

Add multi-region support #856

Open ezekg opened 3 weeks ago

ezekg commented 3 weeks ago

For GDPR compliance, it would be useful to side-step lawyers and data processing agreements entirely by allowing customers to store their data in the EU.

We could do this with a primary-primary database setup. Each account would have a region, defined at account creation. And each request would accept a Keygen-Region header, either US or EU, with US being the default.

Like environments and the Keygen-Environment header, the Keygen-Region header would switch regions, i.e. databases, for the current request.

What would be super cool would be to allow data to be intermixed, i.e. an account in the US region having data in both the US and EU regions, for compliance reasons.

See: https://sentry.engineering/blog/3m-dollar-dropdown

ezekg commented 3 weeks ago

If we were to always store account-level data in the US, it would be easier to implement multi-region support for a single account (i.e. intermixed data). Otherwise, we potentially need to query for the account in all regions in order to set the current tenant. And we would also need to assert uniqueness across regions, which would be pretty hard. Everything else, though, aside from the account, can be stored in the account's region by default.

ezekg commented 3 weeks ago

This would also require servers and workers in each given region, to ensure compliance for data processing, but that can be handled via the Keygen-Region header (again, always US by default) and a location-aware load balancer or router (look at Fastly's offering).

ezekg commented 3 weeks ago

To simplify, we could implement multi-region data storage/residency first (i.e. database only), and full multi-region support later (servers, etc.)

ezekg commented 3 weeks ago

Need to note that any joins on the account, e.g. the pruning jobs, would need to be rewritten since accounts would only exist in the US region.

ezekg commented 3 weeks ago

If we were to intermix, regions would need to be implemented similarly to environments, where any record's belong to associations must also be in the same region. Otherwise we get problems where e.g. machine counts per-license are inaccurate.

Edit: actually, this isn't the case since the parent wouldn't exist since it's in a separate database.

ezekg commented 3 weeks ago

What if we modeled this as Silos? Where each "regionable" (or "siloable") model belongs to a Silo, and a Silo belongs to a Region (either itself modeled or just an opaque string like backend).

ezekg commented 3 weeks ago

The per-request region switching would make enforcing account-level limits challenging, since resources could span multiple regions i.e. databases. E.g. an account's hard limit on ALUs for the Dev 0 tier would be challenging to enforce. But we could always make it so multi-region support was only available on Std or Ent tiers to work around that.

ezekg commented 3 weeks ago

Workers would need to be region-aware. For example, a license expiration worker would need to look at licenses across all regions, and the worker that waits on an artifact's upload would need to know which region the artifact was created in.

ezekg commented 3 weeks ago

If we only implement multi-region data storage, we also introduce significant latency from US servers to EU databases.