incuna / django-pgcrypto-fields

Transparent field level encryption for Django using the pgcrypto postgresql extension.
BSD 2-Clause "Simplified" License
230 stars 50 forks source link

`__in` type queries do not work with encrypted fields? #435

Open ckcollab opened 2 years ago

ckcollab commented 2 years ago

Hey there! Thanks so much for releasing this for free, so far it's been working pretty well for us.

We have

Working code + the query it generates:

BrokerageAccountSchwab.objects.filter(account_number="12345678")
SELECT * WHERE convert_from(decrypt(dearmor("schwab_brokerageaccountschwab"."account_number"), b'1t)ic$cud3mk-75c^6#_w)!xg-j=496d', 'aes'), 'utf-8') = 12345678

Not working code + query it generates

BrokerageAccountSchwab.objects.filter(account_number__in=["345643656", "4242345"])
SELECT * WHERE "schwab_brokerageaccountschwab"."account_number" IN (345643656, 4242345)

It seems like the __in query has removed the encryption "stuff?" Is this intended, is it possible that we augment this to automatically encrypt lists as well?

some1ataplace commented 1 year ago

It is possible that the __in query is not working with encrypted fields because the django-pgcrypto-fields library does not support encrypting lists or arrays by default.

To modify the library to automatically encrypt lists as well, you may need to modify the pgcrypto/fields.py file to add support for encrypting and decrypting lists or arrays. This may involve converting the list to a compatible data type, such as a string or JSON object, before encrypting it, and then converting it back to a list after decrypting it.

Once you have modified the library to support encrypting lists, you should be able to use the __in query with encrypted fields in the same way as with unencrypted fields. However, it is important to note that encrypting large lists or arrays can have a significant impact on performance, and may not be suitable for all use cases.


It looks like the behavior you're seeing is indeed intended. When using encrypted fields with Django, any filtering or querying has to happen post-decryption, as the encrypted data is stored in the database in an unreadable form.

In the case of your first code snippet, the encrypt function is used to encrypt the account number before it is stored in the database. When querying for a specific account number, the value you pass to the filter method is converted to the encrypted format using the decrypt and convert_from functions, and then compared to the stored (encrypted) value in the database.

However, in the second code snippet where you're trying to use in to query for multiple account numbers, the encrypted format doesn't match a standard list of values, so the query is not properly constructed.

One possible solution to this would be to create a custom query method that can handle an encrypted list of values. This would involve decrypting the stored values in the database, and then comparing them to a list of decrypted values. However, this approach may be slower and more resource-intensive than a standard Django query, so it should be used with caution.


Here's an example solution based on the django-pgcrypto-fields library that allows for encryption and decryption of lists or arrays:

from django.db import models
from django_pgcrypto_fields import PGPSymmetricKeyField
import json

def list_to_json(data):
    """
    Helper function to convert a list to JSON format
    """
    return json.dumps(data)

def json_to_list(data):
    """
    Helper function to convert JSON data back to a list
    """
    return json.loads(data)

class EncryptedListField(PGPSymmetricKeyField):
    """
    Custom encrypted list field that can store and retrieve encrypted lists
    """
    def get_prep_value(self, value):
        """
        Override the default method to convert the list to JSON format before encryption
        """
        if value is None:
            return value
        json_data = list_to_json(value)
        return super().get_prep_value(json_data)

    def from_db_value(self, value, expression, connection):
        """
        Override the default method to convert the decrypted JSON data back to a list
        """
        if value is None:
            return value
        decrypted_data = super().from_db_value(value, expression, connection)
        return json_to_list(decrypted_data)

    def from_python(self, value):
        """
        Override the default method to convert the list to JSON format before encryption
        """
        json_data = list_to_json(value)
        return super().from_python(json_data)

    def to_python(self, value):
        """
        Override the default method to convert the decrypted JSON data back to a list
        """
        decrypted_data = super().to_python(value)
        return json_to_list(decrypted_data)

With this custom field, you can create a model that includes an EncryptedListField to store an encrypted list of values. Here's an example:

class MyModel(models.Model):
    my_list = EncryptedListField(null=True, blank=True)

    def str(self):
        return str(self.my_list)

To use this field in your Django queries, you can construct the query as follows:

MyModel.objects.filter(my_list__contains=['item1', 'item2'])

This will return all instances of the MyModel class where the encrypted my_list` field contains the values "item1" and "item2".

Note that when using encrypted fields in Django, it's important to keep in mind the performance implications of decrypting and comparing data for queries. Also, ensure that your data values fit within the size limits for the encryption function used, and consider the security implications of the data being stored in a decrypted form somewhere in your application.