relaycorp / cloud-gateway

Infrastructure as Code and configuration for all Awala-Internet Gateways run by Relaycorp
MIT License
1 stars 1 forks source link

Prove that our cloud infrastructure matches the code in our open source repositories #8

Open gnarea opened 4 years ago

gnarea commented 4 years ago

Executive summary

We must prove that our cloud infrastructure matches the Docker images, Kubernetes resources and Terraform resources in our open source repositories. Simply asking people to trust us is not an option: They must have the certainty that we're not spying on them, selling their data, running a mass surveillance programme for the Five Eyes or censoring people.

Another (equally important) reason to do this is to protect the Relaycorp SRE team from powerful adversaries who might secretly try to force us (collectively or individually) to give away certain metadata or censor certain users/services. If every change or external access to the infrastructure is independently verified, this threat should be avoided. (Unfortunately, an attacker unaware of this measure may still target us, but we can mitigate this by advertising very prominently the fact that our cloud infrastructure is independently verified.)

We basically have to prove two things: That our cloud infrastructure is exactly what people can find on GitHub, and that we don't have any backdoors.

Why? Relaynet is end-to-end encrypted and doesn't leak PII

Indeed, those two properties make Relaynet apps immune to a wide range of privacy threats you'd tend to find in Internet apps. However, we could theoretically still infer the following:

Additionally, we need to log the IP addresses from end users and couriers so that we ensure our systems are being used fairly and to triage production issues.

Finally, it's likely we'll eventually have to block certain centralised services to comply with UK/US legislation, so in this case we'd have to prove that we're only blocking the services listed publicly. (This doesn't apply to decentralised services, which we could never block)

How? Cloud provenance is not a thing yet

Option A: Google Trillian Logs

In an ideal world, our cloud providers (Terraform Cloud, Kubernetes, GCP, Mongo Atlas and Cloudflare) would use a tool like Google Trillian to log provisioning, deprovisioning and access events. This would allow us to broadcast logs so that anyone anywhere can verify the integrity of our cloud infrastructure.

We'd essentially be moving the provenance issue up in the chain, and it'd be up to cloud providers to honour their contractual obligations with Relaycorp and comply with applicable legislation. They'd have a lot to lose if they don't.

But this option isn't really an option in the foreseeable future.

Option B: Ask a reputable, independent third-party to audit our infrastructure in real time

They'd basically get read-only access to the configuration of our cloud resources (but no access to the data inside), as well as their (de)provisioning and access logs. With this level of access, they could operate a system 24/7 to monitor our cloud resources and make sure they match the public Docker images, Kubernetes resources and Terraform resources.

I don't think a software tool like this exists yet, so we'll have to build it and make it open source. This tool has to be trivial to deploy, run and upgrade.

This tool should effectively make sure that provisioning and deprovisioning events match changes to cloud resources on GitHub. Additionally, the tool could also consume access logs so this independent party can be alerted to any direct access to the DB (for example) -- If we need to access the DB, we should justify that access to them (e.g., investigating a security vulnerability).

This would make offsite backups tricky, because we'd need a secret key to decrypt the backups if we need to restore them. One way to address this is by splitting the key, and having their part of the key available on demand in the tool. But this would introduce two additional challenges:

Option C: Deploy an independent tool that tracks our infrastructure in real time

We'd leverage the tool described in Option B, but we'd deploy it ourselves to a separate GCP project whose audit logs are publicly available.

Publishing audit logs is a bit risky, since they might (occasionally) contain sensitive information or PII about Relaycorp staff, which is why we're not making audit logs publicly available in the GCP projects hosting the services.

This option has no dependencies on third parties so it seems like the most likely approach to begin with.

Provenance is necessary but not sufficient to gain trust

There are many more things we have to do to gain people's trust, including non-technical measures such as transparency when dealing with law enforcement (I think Signal is an example to follow in this regard).

gnarea commented 3 years ago

Check this Twitter thread and my reply.

TL;DR: Get some ideas from https://transparency.dev/ and https://cloud.google.com/confidential-computing to get closer to the endgame, though not yet the endgame I want.