kvesteri / sqlalchemy-utils

Various utility functions and datatypes for SQLAlchemy.
Other
1.25k stars 318 forks source link

New enforcement of `String` over `LargeBinary` on EncryptedType breaks existing implementations #444

Open leosussan opened 4 years ago

leosussan commented 4 years ago

See my comment on https://github.com/kvesteri/sqlalchemy-utils/pull/426, which is when this was introduced.

This seems to have broken things for me & probably will for others.

In my implementation (PostgreSQL 11, using FernetEngine), up to this point, encrypted values are stored as a bytea column, which is serialized into a bytes in Python prior to encryption. bytes has no .encode() function and subsequently breaks here.

This would conceivably break on any non-string object when extracted from the database (e.g. the sa.Unicode implementation in the docs for EncryptedType).

Moving .encode() to the str() step & removing from value fixes extraction.

        if isinstance(value, six.text_type):
            value = str(value).encode()
        decrypted = self.fernet.decrypt(value)

This doesn't solve the problem of encryption into the DB, which expects bytea, not a string.

leosussan commented 4 years ago

Continuing the conversation from https://github.com/kvesteri/sqlalchemy-utils/pull/426#issuecomment-626853516:

I personally believe that your change is a good one, for the reasons you've outlined when you started this PR. To add to this, at least in Postgres, bytea is generally less performative than text when storing encoded data.

But - at the very least, the documentation / changelog should make clear that the change will break existing implementation.

For the sake of continuity, here's my current workaround. It works by subclassing EncryptedType, redefining impl, and reversing the adjustments:

class CustomEncryptedType(EncryptedType):
    impl = LargeBinary

    def process_bind_param(self, value, dialect):
        value = super().process_bind_param(value=value, dialect=dialect)
        if isinstance(value, str):
            value: bytes = value.encode()
        return value

    def process_result_value(self, value, dialect):
        if isinstance(value, bytes):
            value: str = value.decode()
        value: Optional[Any] = super().process_result_value(
            value=value, dialect=dialect
        )
        return value

It might make sense to introduce this as a legacy option, e.g. LegacyEncryptedType, but urge users of EncryptedType to write a migration for their respective databases.

villebro commented 4 years ago

The change introduced by #426 breaks alembic migration code that works on MySQL on sqlalchemy-utils==0.36.5:

import sqlalchemy as sa
from alembic import op
from sqlalchemy_utils import EncryptedType

...

def upgrade():
    op.add_column(
        "dbs", sa.Column("password", EncryptedType(sa.String(1024)), nullable=True)
    )
  File "/opt/hostedtoolcache/Python/3.6.10/x64/lib/python3.6/site-packages/sqlalchemy/dialects/mysql/base.py", line 2047, in visit_VARCHAR
    "VARCHAR requires a length on dialect %s" % self.dialect.name
sqlalchemy.exc.CompileError: VARCHAR requires a length on dialect mysql
instantlinux commented 4 years ago

Ditto this for me: this is a breaking change that would require a semver update and doc on what to do instead. I get the following stacktrace on a migration that works fine with 0.36.4 on MariaDB versions 10.3.x and 10.4.x:

Traceback (most recent call last):
  File "/home/richb/apicrud/python_env/lib/python3.6/site-packages/sqlalchemy/sql/compiler.py", line 2897, in visit_create_table
    create_column, first_pk=column.primary_key and not first_pk
  File "/home/richb/apicrud/python_env/lib/python3.6/site-packages/sqlalchemy/sql/compiler.py", line 350, in process
    return obj._compiler_dispatch(self, **kwargs)
  File "/home/richb/apicrud/python_env/lib/python3.6/site-packages/sqlalchemy/sql/visitors.py", line 95, in _compiler_dispatch
    return meth(self, **kw)
  File "/home/richb/apicrud/python_env/lib/python3.6/site-packages/sqlalchemy/sql/compiler.py", line 2930, in visit_create_column
    text = self.get_column_specification(column, first_pk=first_pk)
  File "/home/richb/apicrud/python_env/lib/python3.6/site-packages/sqlalchemy/dialects/mysql/base.py", line 1588, in get_column_specification
    column.type, type_expression=column
  File "/home/richb/apicrud/python_env/lib/python3.6/site-packages/sqlalchemy/sql/compiler.py", line 400, in process
    return type_._compiler_dispatch(self, **kw)
  File "/home/richb/apicrud/python_env/lib/python3.6/site-packages/sqlalchemy/sql/visitors.py", line 95, in _compiler_dispatch
    return meth(self, **kw)
  File "/home/richb/apicrud/python_env/lib/python3.6/site-packages/sqlalchemy/sql/compiler.py", line 3440, in visit_type_decorator
    return self.process(type_.type_engine(self.dialect), **kw)
  File "/home/richb/apicrud/python_env/lib/python3.6/site-packages/sqlalchemy/sql/compiler.py", line 400, in process
    return type_._compiler_dispatch(self, **kw)
  File "/home/richb/apicrud/python_env/lib/python3.6/site-packages/sqlalchemy/sql/visitors.py", line 95, in _compiler_dispatch
    return meth(self, **kw)
  File "/home/richb/apicrud/python_env/lib/python3.6/site-packages/sqlalchemy/sql/compiler.py", line 3418, in visit_string
    return self.visit_VARCHAR(type_, **kw)
  File "/home/richb/apicrud/python_env/lib/python3.6/site-packages/sqlalchemy/dialects/mysql/base.py", line 2061, in visit_VARCHAR
    "VARCHAR requires a length on dialect %s" % self.dialect.name
sqlalchemy.exc.CompileError: VARCHAR requires a length on dialect mysql

The column that triggers this error is defined thus:

    sa.Column('password', EncryptedType(sa.String(length=77)),
              nullable=False),

Please revert the change, fix it in a way that doesn't break this straightforward case, or provide doc on what I need to change this to.

villebro commented 4 years ago

I agree with @instantlinux , this really should be reverted in 0.36.x and reintroduced in 0.37, preferably with some additional updating notes.

rushilsrivastava commented 4 years ago

Here to iterate the same thing as @instantlinux, this has broken our existing migrations and it would be great to have some updating notes before adding such a change. @villebro Have you had any luck fixing the migration problems for MySQL?

Lawouach commented 4 years ago

Hello,

I was surprised by this change but I appreciate this was quickly addressed. However, 0.36.6 is still broken compared to 0.36.4 on my code base when using (PostgreSQL 11).

Reverting to 0.36.4 works fine.

So the question is, should I pin to 0.36.4 for now or was 0.36.6 meant to work?

Thanks :)

rudyryk commented 3 months ago

Also we get a lot of warnings:

The 'EncryptedType' class will change implementation from 'LargeBinary' to 'String' in a future version. Use 'StringEncryptedType' to use the 'String' implementation.

But there's no simple way (or no documented way?) to perform this migration on live data. One idea is to have a class for LargeBinaryEncryptedType which can be used explicitly without those annoying warnings.