hasura / graphql-engine

Blazing fast, instant realtime GraphQL APIs on your DB with fine grained access control, also trigger webhooks on database events.
https://hasura.io
Apache License 2.0
31.15k stars 2.76k forks source link

Feature request: Client side data encryption by Hasura #1204

Open marco-fp opened 5 years ago

marco-fp commented 5 years ago

Feature Request

What feature are you missing?

Support for encrypting certain fields at rest on the DB, using a specified secret key (using symmetrical or possibly asymmetrical encryption)

How could this feature look like in detail? Tradeoffs?

  1. Specify an environment variable with either the key or the route to the key in the filesystem for the encryption keys to be used.
  2. Specify from the interface (table creation/edition) if this fields are to be encrypted at rest and possibly choose which key to use.
  3. When inserting those fields, encrypt them on the fly before storing them in the DB. When reading those fields, decrypt them using the proper encryption key.

Tradeoff: Insertion/read speed of those fields would be affected as a result, due to the obvious overhead.

Motivation

This functionality would open the use of Hasura to new kinds of applications where security is an important feature.

0x777 commented 5 years ago

Let's see if there is more interest for this feature from the community.

theshire-io commented 5 years ago

Imho, the key here is that if Hasura does this automagically it should be done in such a way that the functionality can easily be replicated when accessing PG without Hasura.

Suppose you've encrypted (at rest) some columns that contain important data. Most probably, you'll want to use the important data within a transaction. And Hasura doesn't handle transactions yet. So you'll need to do it by accessing the database directly. Then what?

I think that the way to go for this feature is to base the functionality on pgcrypto (which is the most suggested way for doing at rest encryption in PG) and clearly document with examples how one can do the decryption/encryption manually and still be 100% compatible with what Hasura expects. Because if not done with care, it can lead to data loss.

lidorcg commented 4 years ago

Hello everyone, Has anyone done anything like this? Is there a recommended solution for the current version of hasura?

lidorcg commented 4 years ago

Another important point: I'm given to understand that according to GDPR, personal information needs to be encrypted within databases. If true, this feature is a must for most production apps, unless there's a way to do this directly in the DB or some patch that can be made with Hasura's current tool set.

ghost commented 4 years ago

Yeah this is an important feature.

str4 commented 4 years ago

Another important point: I'm given to understand that according to GDPR, personal information needs to be encrypted within databases. If true, this feature is a must for most production apps, unless there's a way to do this directly in the DB or some patch that can be made with Hasura's current tool set.

From what I've gathered, encryption is not mandatory, but is one of the practices that a service controller may use in order to protect sensitive data.

From GDPR Recital 83:

In order to maintain security and to prevent processing in infringement of this Regulation, the controller or processor should evaluate the risks inherent in the processing and implement measures to mitigate those risks, such as encryption.

https://gdpr-info.eu/recitals/no-83/

But nevertheless, I see this as a critical feature, and need / would very much like to see this in Hasura.

arpitjacob commented 4 years ago

I've been looking at various other solutions, this feature would really make a big difference.

greenisagoodcolor commented 4 years ago

I too would love if there was an intermediate ability for record level encryption for data. We are a heavily regulated business and may struggle to continue using hasura in the future if this can't be the case.

How can I assist the development?

tirumaraiselvan commented 4 years ago

@greenisagoodcolor Would be great if we can hop on a call to chat about your requirements/use-cases. We are currently scoping this feature and your input will be very useful.

If you could drop me a note attiru@hasura.io then we can schedule something from there.

federicobadini commented 3 years ago

Has there been any progress on this feature? I think it is a critical feature for several applications

swangy commented 3 years ago

another vote for encryption at rest!

24601 commented 3 years ago

After reading through the comments I'm having a difficult time understanding (and this is a genuine "I am failing to understand" and not a passive-aggressive "you're wrong"!) what the benefit or unique use case for encryption at the Hasura-level provides*.

I see a variety of discussions, but I don't actually understand how or why it's appropriate at the Hasura level, which, in the stack, is really a relatively high-level action-only/no-storage intermediary between the client application and the data at-rest (assuming Hasura does not cache-to-disk or output/log data, which I believe it does not if appropriately configured).

  1. GDPR needs. I am not a lawyer, and a lawyer versed and practicing in the area should probably be consulted, but GDPR doesn't necessarily mandate that an application like Hasura do it, in fact, if anything, it suggests this be done for the "at rest" portion at the disk or Postgres level. Am I missing a portion of GDPR that isn't address in the next few points?

  2. Encryption-at-rest. What is the business case for doing this in Hasura versus the disk or PG shard/server level? It introduces a bevy of key management, error, and other distractive issues that sound absolutely hellacious from a development but also operations perspective to manage when both physical and logical disks offer hardware and software solutions for data-at-rest.

  3. Record-level encryption. Now here is where we get into the useful Hasura-level-in-the-stack use-cases, I think. Some sort of discriminator (or even GraphQL mutation-level?) flag about which symmetric key to use COULD perhaps be useful, I imagine? But why wouldn't you just use Postgres-level sharding to direct those records using existing mechanisms to a shard with disk-level encryption handled by a truly secure hardware encryption solution without keys living somewhere they actually aren't secure.

And perhaps #3 is really the issue I have here, the security tradeoffs you make when you have to provide a piece of software like Hasura with symmetric encryption materials IMO totally defeat any benefit. You're storing these keys in memory (unless the proposal is to code into Hasura native support for HSMs?), or am I missing some mechanism by which the symmetric encryption is done securely?

Honestly, IMO, the only performant, secure method for storage encryption is hardware TPM/HSM/Secure Enclave modules, general purpose CPUs/MCUs/SoCs are wildly insecure and the second the key hits RAM you may as well consider it compromised if you're at the level where you're worrying about encryption-at-rest, IMO. If that sounds odd because you're only really worried about backups walking off, disks being lifted out of machines, etc, well, why not just use OS or disk level encryption on the block storage level/backup level? Any vendor of any credibility (at both the compute side and the storage side) has credible in-device offering, as does every major cloud vendor (whose offering is likely backed by true HSMs).

What compliance regimes are people seeing/using that are mandating encryption so high in the stack? What other items do these regimes specify elsewhere in their (I am sure quite lengthy?) specifications that discussion key management, etc? I am certain there are MANY use cases and MANY compliance regimes out there I am not familiar with, so I ask this as a GENUINE question, but every single compliance regime and audit framework I've encountered, when it really gets serious, recognizes the weaker links here are key management, and hardware encryption with key storage in HSMs is the only tenable method, at which point, why are we doing the encryption in Hasura? When we do the encryption in Hasura, don't we lose ANY chance of doing any kind of query, indexing, etc in Postgres? Why not delegate these functions to TPMs, HSMs, etc at the logical or physical storage level, or even at the Postgres level?

*Field-level encryption is really the only thing I can think of that's a use case that justifies the burden of encryption so high in the stack, and I could certainly see wanting to expose some of Postgres's functions and methods for this into Hasura's GraphQL schema, that makes sense if that is truly the requirement, although at that point, I'd really wonder, why not just encrypt the whole record at the shard/table level?

The other * item is, of course, records with different keys (so record-level encryption with differing keys, or customer provided keys where even sharding is not practical to have a shard per key if there are thousands of different keys possible that are highly workload/data set variable).

To address the OP's individual use cases, I have a few thoughts:

  1. "Specify an environment variable with either the key or the route to the key in the filesystem for the encryption keys to be used."

I cannot imagine storing a key in a filesystem. This defeats nearly any acceptable security practice. If the key is living in RAM or storage, it's compromised.

  1. Specify from the interface (table creation/edition) if this fields are to be encrypted at rest and possibly choose which key to use.

I think PG-14 TDE is gong to offer this functionality that could allow CREATE TABLE/etc to be done via the SQL function in the Console and track the created table....that's still a formative thing, though

  1. When inserting those fields, encrypt them on the fly before storing them in the DB. When reading those fields, decrypt them using the proper encryption key.

That is possible, but you lose all ability to index, like, do anything but select them...why not use PG or disk level and not give these up?

sapientpants commented 3 years ago

From my experience, I have seen GDPR implementations using a master key with a per-user key that are then combined to derive an actual per-user encryption key. The reasoning is that if someone manages to get a dump of your database, you want the user data to be encrypted. You also want to easily be able to support users requesting their data be deleted (i.e., simply delete the user's key and then you can't derive the encryption key anymore and their data is inaccessible). By spreading out the information to do so across an environment variable, the database and the code it is more difficult for someone to get access to the encrypted data. See https://github.com/attr-encrypted/attr_encrypted for a library that supports this in a Rails context.

I wouldn't refer to this as encryption at rest though as I tend to think of that as disk level encryption meant to protect against drives being stolen or improperly disposed of. To me this isn't the main goal for adding encryption to Hasura, rather it is the scenario in the previous paragraph.

L-U-C-K-Y commented 3 years ago

Hi all

Encryption of certain columns is also a requirement that we have deal with, as we store client-identifying-data.

We are highly motivated by this as some customers audit us in terms of access management, security and encryption of CID.

I saw that Prisma has a very neat solution coming up in this regard, where a user can define encryptors, such as Vault, AWS KMS that is integrated into Prisma. When viewing the data in the database, the data has an encryptor prefix and then the encrypted data follows. When accessing, Prisma will fetch the decryption key from Vault and return the data, providing a great experience.

Further, encryption can be introduced gradually within a column, as the old data just stays as it is until its accessed and the new encrypted data is identifiable with the prefix.

What do you think?

24601 commented 3 years ago

In such a compliance regime is there a specific requirement that would make tablespace-level encryption (or encryption even at the block device level?) unsuitable?

It is finite and determinate as to which tablespace(s) a column will end up being stored in, if data-at-rest-encryption is what is actually required, is there a requirement that this be done at the PG level rather than lower in the storage stack?

Given the robust nature of column-level permissions, it should not be difficult to limit access to sensitive columns, and if the decryption is happening at the Hasura level in this proposal anyway, the end user consuming the data is going to see it nonetheless.

I am just trying to understand how block-level encryption, etc does not meet certain requirements or compliance schemes, I am sure they exist, but I haven't yet heard of one where mandating engine-level symmetric encryption is the only solution, which brings a non-trivial amount of downsides and vulnerabilities with it (not to mention possible performance issues galore).

On Fri, Mar 26, 2021 at 1:18 AM Lucky @.***> wrote:

Hi all

Encryption of certain columns is also a requirement that we have deal with, as we store client-identifying-data.

We are highly motivated by this as some customers audit us in terms of access management, security and encryption of CID.

I saw that Prisma has a very neat solution coming up in this regard, where a user can define encryptors, such as Vault, AWS KMS that is integrated into Prisma. When viewing the data in the database, the data has an encryptor prefix and then the encrypted data follows. When accessing, Prisma will fetch the decryption key from Vault and return the data, providing a great experience.

Further, encryption can be introduced gradually within a column, as the old data just stays as it is until its accessed and the new encrypted data is identifiable with the prefix.

What do you think?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hasura/graphql-engine/issues/1204#issuecomment-807997051, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI2QV7FFREUE3DV3O3LWB3TFQYLZANCNFSM4GKFKSQA .

cakemountain commented 3 years ago

Definitely would love to see this feature. I need to encrypt certain sensitive tables/columns at rest for legal/privacy reasons and it would be really, really nice if Hasura did this.

cc:ing @tirumaraiselvan - do you think this is something your team could build out sometime in the near future? It's an important enterprise feature and I'm surprised it's not included in Hasura Cloud at least.

L-U-C-K-Y commented 3 years ago

I am just trying to understand how block-level encryption, etc does not meet certain requirements or compliance schemes, I am sure they exist, but

Worst-case scenario: A data leak.

Enterprise customers ask us how we handle this scenario. Assuming the database, or a backup of the database gets stolen and it contains client-identifying data:

hugbubby commented 3 years ago

It seems to me like we're running into fundamental design flaws here. While it might check off a compliance box, Hasura should not be the party encrypting select contents of its database. If Hasura decides to do transparent encryption/decryption of certain tables or columns automatically, there's still no protection against GraphQL injection, a hasura instance getting compromised, or administrator Hasura credentials getting leaked.

The only benefit to this (as long as, of course, the key was pulled from some key storage service and not kept as a file on disk) would be to prevent filesystem snapshots or database backups from containing secret tables. That's better than nothing, but it's strictly inferior to just encrypting the entire filesystem/database backup itself. Instead, the backend services that need to use sensitive information should instead have their own KMS key that hasura doesn't know about.

coffenbacher commented 3 years ago

If Hasura decides to do transparent encryption/decryption of certain tables or columns automatically, there's still no protection against GraphQL injection, a hasura instance getting compromised, or administrator Hasura credentials getting leaked. The only benefit to this (as long as, of course, the key was pulled from some key storage service and not kept as a file on disk) would be to prevent filesystem snapshots or database backups from containing secret tables.

IMO field-level encryption does have value, it makes it possible to lose control of database access / contents and still have some layer of protection on important data. This could be through a SQL injection attack that dumps raw data, or losing a database backup, or a malicious database tool (ie an evil DBeaver), or an exposed SQL database w/ poor credentials (for example a shoddily secured local copy of a production database), or a MITM in some places, or any number of other things.

Obviously there are other things being done to combat all of those possibilities, but defense-in-depth tells me that field-level encryption is worth it in some scenarios.

TeoTN commented 3 years ago

I wonder, is this not about encrypting already encrypted data? After all, the db is already encrypted, isn't it? So this is probably a matter of setting the access permissions so that users (including admins) can only read their own data, and ensuring that admin can't relax the access permissions anymore?

SameerChorge94 commented 2 years ago

Is there any update on this feature request?

Garbee commented 2 years ago

A specific case I have for encryption at the column-level at least is personally identifiable information of users. I am building a product where we need to store names and emails at least, with potentially sound recordings for a soundboard. I'd like to have as much private information encrypted as possible. That way in case the database is attacked, so long as the hasura instance itself isn't accessed the data will need to get decrypted by brute force to be of any use.

I would like to see multiple passphrase support as well, rather than one global key to encrypt everything. This way for personally identifiable information, that would have its own passphrase. Then sound recordings for example could be encrypted with another unique passphrase making attacking data in multiple areas even more difficult.

One requirement of any system doing encryption like this these days must be allowing for key rotation as well. Some mechanism needs to exist for converting data from an old passphrase to a new one. Otherwise, if a key leaks quickly re-encrypting the data is a problem.

dameleney commented 1 year ago

This is on our roadmap. We do not have a timeline at present. Meanwhile, please add additional comments and use cases to this ticket.

A comprehensive RFC that covers all use cases and limitations of the feature will be posted on this issue. We welcome more detailed feedback from you once we provide those details.

rikur commented 1 year ago

https://supabase.com/blog/transparent-column-encryption-with-postgres food for thought

erquhart commented 1 year ago

Just a mention here, as I'm also not sure how Hasura would go about supporting this - I use a managed db provider (Aiven) and encryption at rest happens by default. Nothing required on Hasura's part.

davidpanic commented 1 year ago

@erquhart If you define "encrypting at rest" with just HTTPS then yeah, Hasura doesn't need to do anything, but the actual physical data is still stored unencrypted. What Hasura would have to do is encrypt the data before it is forwarded to the database.

erquhart commented 1 year ago

This is their blurb on encryption:

We enforce Transport Layer Security (TLS) encryption for connections used in transferring data and encrypt it when it is on the disk.

Encryption on disk is, I believe, what you’re referring to when you say “the actual data”, no?

Garbee commented 1 year ago

Let's try to not detract from the primary conversation here on getting Hasura to support column/table level encryption.

Discussing exactly how Aiven, or any other provider, handles this without a document detailing what "at rest" means to them and how it is implemented is frivolous. For example, if they encrypt the drive data as a whole at rest; then an attacker just needs to get in and find the decryption keys. Then all your DB data is sitting in plain-text for them.

The only true way to encrypt at-rest thoroughly is to encrypt before it is stored anywhere. Then if storage encrypts as well, great that is basically two layers of encryption to help protect the data.

I think the title of this issue is a bit misleading from the actual intent. The intent is not encryption at rest as a whole (although that would be nice too for general support in the platform) but specific column/table level encryption that is baked into the schema.

erquhart commented 1 year ago

Good distinction, and I see now the OP does call out encrypting select fields specifically.

OliverSo commented 10 months ago

Is there any update on the feature request?

manasag commented 10 months ago

Thanks everyone for your comments and patience on this issue. Hasura till now has been a very monolithic codebase. Over the period of time, it has become difficult to expand upon the Database native or non GraphQL related features (Postgres in this case) that works really well, and do not affect/break existing users. We have been working on a Data connector architecture in Hasura V3, which decouples this in a great way. In Hasura V3, the Postgres Connector (written in Rust and open sourced) is responsible for translating simplified queries from Hasura V3 engine to actual SQL queries and provide response. With this architecture, we can now think of flavors of Postgres Connectors that can support, in this case, client side encryption of Postgres columns.

We are keeping this issue open, as we continue to build upon Hasura V3. Please do try the Alpha version of Hasura V3 and provide us with your valuable feedback.