kitsonk / kv-toolbox

Utilities for working with Deno KV 🦕🗝️
https://kview.deno.dev/kv-toolbox
MIT License
66 stars 5 forks source link

Encryption for blobs #13

Closed inverted-capital closed 5 months ago

inverted-capital commented 5 months ago

👋

I would like the added guarantee to my users that their sensitive data is encrypted at rest. I am creating this issue to check if this is within the goals of kv-tookbox, because doing it correctly in a single place saves a lot of re-implementation, and helps consumers avoid doing crypto wrong, which is very easy to do.

Anyone wanting to encrypt is going to be storing blobs, regardless of what they started with, so naturally a blob library is sought, and they'll naturally arrive at kv-toolbox.

The interface I would expect is:

export async function set(
  kv: Deno.Kv,
  key: Deno.KvKey,
  blob: ArrayBufferLike | ReadableStream<Uint8Array> | Blob | File,
  options?: { expireIn?: number, encryptionKey?: string },
): Promise<Deno.KvCommitResult> {
  // 
}

Followed by some helper functions to generate a key in an acceptable form, probably jwk or similar.

Given that crypto is built in to deno, performance should be quite good.

kitsonk commented 5 months ago

Interesting suggestion... Let me think about a good way to handle this, as it does feel like a good feature to incorporate into kv-toolbox as a whole, not only blob support.

inverted-capital commented 5 months ago

Awesome !

In the meantime I will figure out a good plan for handling the encryption keys themselves in deno deploy. Storing the keys in an environment variable would defeat the purpose of the encryption protection if deno deploy was compromised at rest, since presumably the env vars are nearby the kv data.

kitsonk commented 5 months ago

Around key management, I was looking at Cloudflare Workers to see what they do just to see if my assumptions are correct, and based on that, I think what Deploy currently provides is sufficiently secure.

Cloudflare Workers has two types of environment variables, "regular" ones and "secret" ones. Regular ones values can be viewed as part of the configuration for the project, while secrets are one way (similar to the way GitHub and other services manage secrets). Deploy only has the equivalent of "secrets".

This means the only way to exfiltrate an encryption key is to compromise an account that can make a deployment, overwrite the deployment with code the would log to the console and the retrieve the key that way. This means, even a compromised API key that could be used to connect to the KV store remotely would not be able to access the data, as well as the manually backed up data built into Deploy would not be decryptable.

I think any other key management solution (like finding a way to integrate to a key store) would ultimately have a similar attack surface, that if a threat actor had access to overwrite a deployment, they would be able to exfiltrate an encryption key, but that generally remote access to the data store or a version of backed up data would be difficult to compromise without the original encryption keys.

kitsonk commented 5 months ago

Encryption for blobs is going to be practical, but encryption for non-byte based objects is going to be impractical. Deno Deploy does not expose the object serialization APIs, which means there is no exposed API to turn a cloneable object, which means that something else has to be used to serialize the value and convert it to bytes to be able to encrypt it and everything I have found so far that would work is VERY slow.

I think I will create the blob stuff first and then see where to go from there.

inverted-capital commented 5 months ago

Thank you for the key management insights.

If you could overwrite a deployment then you can access the keys is a good fatal attack scenario that has no defense, and so gives good grounding to any security efforts.

Deno Deploy does not publish how the env secrets are stored (and understandably too, since they are still in beta) so without knowing that, we can't pass on those guarantees to our customers.

The attack I was thinking of that a secrets provider avoids is that someone accesses the secrets store for Deno Deploy environment variables and also takes a copy of the Deno KV database, enabling them to read the sensitive data offline.

A secrets store could be locked down to only answer requests from Deno Deploy IP addresses, so denying offline access, and if we were notified of the Deno Deploy breach, we could cycle the secret provider access credentials.

This all sounds like a lot of work for possibly no gains if Deno Deploy has good secrets management in place (which I'm sure they do !) I just wish we knew what it was so we could pass those assurances on to our customers.

kitsonk commented 5 months ago

@lucacasonato hate to drag you into a conversation, but it seemed better than opening some other issue where it might not be easily seen. Is there any documentation or public statements about the security surrounding env variables. Like are the values encrypted at rest and that there is no API that would allow external retrieval of the values?

A secrets store could be locked down to only answer requests from Deno Deploy IP addresses, so denying offline access, and if we were notified of the Deno Deploy breach, we could cycle the secret provider access credentials.

Fundamentally if Deploy stores the values of env variables at rest and there is no way outside of the the isolate being supplied them to determine their value, that is as secure as a 3rd party keystore, without the complexity. While I haven't looked into it, the egress from Deploy is very unlikely to be a single IP and you would have to specify a range of IPs and then you would be hard pressed to then verify that those requests were coming from your legitimate deployment versus one that you did not control without signing your requests of which you would need to store a secret somewhere and then you are back at the start.

All the being said, Deploy could improve their management of access tokens, currently ones you create never expire and always have full access... They certainly should be more fine grained to limit access to deployments and KV as well as be forced to expire:

Account_Settings

Also, potentially allowing a setting in KV to allow remote connections outside deployments or not.

inverted-capital commented 5 months ago

Great point about the IP addresses being indistinguishable between deployments.

Deno Deploy projects should allow a one way only lockdown mode that makes the project go ballistic (as in, no more changes from the developer except for source code pushes).

This lockdown mode is so close to other features already supported by deno services, like the transparency log, like publishing to jsr and to deno.land - the act of publishing there is taken out of the hands of the developer after the push to source code - we would LOVE (like fall over crying type of love) to have that ability in Deno Deploy where we could do a one way operation where it goes into this provenance attested mode.

This would be like SLSA compliance but for our actual web app, not just the packages we consume.

This is the final step in delivering a web app into a users browser with end to end provenance guarantees.

The leakiness in Deno Deploy is the last and seemingly tiny step in realizing that subtle but profound provenance state - being able to easily and rapidly trace the provenance of a web page back to source code. I could then pass the security and compliance guarantees of Deno Deploy DIRECTLY to my customers. Directly. That is a very compelling reason to drop whatever else you're doing and build on Deno Deploy.

The page for the deployment can then become much like the jsr.io page for a package, where it provides the audit trail, and can give it a score for all its consumed packages in terms of their provenance scores, and any known vulnerabilities, plus provable association with respective source code. The source code need not be public for this page to exist and be useful, as a private repo can be pointed to, and then access shared to interested parties.

inverted-capital commented 5 months ago

To sketch this out further, the ballistic hosting score should be comprised of:

The ideal publication would be an app with no env vars, no outbound network, kv store only readable to the app, and only fully provenance attested jsr.io packages used as dependencies.

kitsonk commented 5 months ago

I've got Uint8Arrays being encrypted and decrypted using Web Crypto (using AES-GCM) and I am getting the following for a ~60k value (which is small enough to be stored just as a value without needing the blob utils). You can get a bit of a performance boost using an AES-128 encryption key versus a AES-256.

cpu: Apple M1 Pro
runtime: deno 1.43.1 (aarch64-apple-darwin)

file:///Users/kitsonk/github/kv-toolbox/crypto.bench.ts
benchmark           time (avg)        iter/s             (min … max)       p75       p99      p995
-------------------------------------------------------------------- -----------------------------
u8                 119.76 µs/iter       8,350.0    (85.83 µs … 3.11 ms) 125.62 µs 216.17 µs 253.29 µs
standard blob      203.39 µs/iter       4,916.6    (157.12 µs … 2.9 ms) 204.71 µs 387.75 µs 446 µs
encrypted blob     989.65 µs/iter       1,010.5  (683.79 µs … 12.13 ms) 811.38 µs 5.64 ms 8.18 ms
inverted-capital commented 5 months ago

that looks plenty fast enough for my purposes - the RTT to the database in Deno Deploy dominates by far