unioslo / mreg

GNU General Public License v3.0
7 stars 13 forks source link

Django 5 deprecates CiCharField. #489

Closed terjekv closed 4 months ago

terjekv commented 1 year ago

See https://adamj.eu/tech/2023/02/23/migrate-django-postgresql-ci-fields-case-insensitive-collation/ for more information.

hostpolicy.HostPolicyAtom.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. hostpolicy.HostPolicyRole.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.Cname.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.ForwardZone.email: (fields.W906) django.contrib.postgres.fields.CIEmailField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use EmailField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.ForwardZone.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.ForwardZone.primary_ns: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.ForwardZoneDelegation.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.Host.contact: (fields.W906) django.contrib.postgres.fields.CIEmailField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use EmailField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.Host.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.HostGroup.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.Label.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.Mx.mx: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.NameServer.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.Naptr.replacement: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.Naptr.service: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.ReverseZone.email: (fields.W906) django.contrib.postgres.fields.CIEmailField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use EmailField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.ReverseZone.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.ReverseZone.primary_ns: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.ReverseZoneDelegation.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.Srv.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead.

terjekv commented 1 year ago

One solution if we want to keep the idea itself (a case-insensitive field) is to migrate to using collations: https://adamj.eu/tech/2023/02/23/migrate-django-postgresql-ci-fields-case-insensitive-collation/

However, an equally valid solution would be to use case-sensitive fields and ensure that we use lower() or similar when we interact with the field. This would also allow us to use other database backends for testing (and production).

terjekv commented 1 year ago

Also note that we use CI-fields for email addresses: https://github.com/unioslo/mreg/blob/806be0114579824c26e5abbff3c8a6c94c1e8bda/mreg/models.py#L116 https://github.com/unioslo/mreg/blob/806be0114579824c26e5abbff3c8a6c94c1e8bda/mreg/models.py#L357

However, https://datatracker.ietf.org/doc/html/rfc5321#section-2.4 states:

Verbs and argument values (e.g., "TO:" or "to:" in the RCPT command and extension name keywords) are not case sensitive, with the sole exception in this specification of a mailbox local-part (SMTP Extensions may explicitly specify case-sensitive elements). That is, a command verb, an argument value other than a mailbox local-part, and free form text MAY be encoded in upper case, lower case, or any mixture of upper and lower case with no impact on its meaning. The local-part of a mailbox MUST BE treated as case sensitive. Therefore, SMTP implementations MUST take care to preserve the case of mailbox local-parts. In particular, for some hosts, the user "smith" is different from the user "Smith". However, exploiting the case sensitivity of mailbox local-parts impedes interoperability and is discouraged. Mailbox domains follow normal DNS rules and are hence not case sensitive.

(My emphasis.)

oyvindhagberg commented 1 year ago

It seems switching to case-sensitive fields is the cleanest solution. However, we must ensure that we don't break anything by doing so. Perhaps there was a reason a case-insensitive data type was chosen in the first place. But overall, it looks like these fields could be case-sensitive (like the email address).

terjekv commented 1 year ago

This came up again due to #492. To have django-filter automatically create filters and lookups for fields, one uses a structure such as:

class HostPolicyAtomFilterSet(rest_filters.FilterSet):
    class Meta:
        model = HostPolicyAtom
        fields = "__all__"

The fields = "__all__"-construct checks the field type and ensures that the appropriate lookups (exact, contains, gt, etc) are made available for the field. However, for our LCI-fields, this automation breaks, seemingly as these field types are not known to django-filter. Specifying the lookup types manually works:

class HostPolicyAtomFilterSet(rest_filters.FilterSet):
    class Meta:
        model = HostPolicyAtom
        fields = {
            'name': ['exact', 'regex', 'contains'],
        }

But, if we do this for one field, we have to do it for all fields in the model, so for this model:

class HostPolicyRole(HostPolicyComponent):

    name = LCICharField(max_length=64, unique=True, validators=[_validate_role_name])
    atoms = models.ManyToManyField(HostPolicyAtom, related_name='roles')
    hosts = models.ManyToManyField(Host, related_name='hostpolicyroles')
    labels = models.ManyToManyField(Label, blank=True, related_name='hostpolicyroles')

    class Meta:
        db_table = 'hostpolicy_role'
        ordering = ('name',)

We would have to specify something like:

class HostPolicyRoleFilterSet(rest_filters.FilterSet):
    class Meta:
        model = HostPolicyRole
        fields = {
            'name': ['exact', 'regex', 'contains'],
            'atoms__name': ['exact', 'regex', 'contains'],
            'hosts__name': ['exact', 'regex', 'contains'],
            'labels__name': ['exact', 'regex', 'contains'],
        }

If we want other filters for other fields in related models, we'd have to be explicit about them.

Moving to case sensitive fields would also avoid issues such as these (from https://adamj.eu/tech/2023/02/23/migrate-django-postgresql-ci-fields-case-insensitive-collation/)

Some of Django’s lookups, like icontains, map to a SQL LIKE. PostgreSQL does not allow such queries with nondeterministic collations—you will see this error if such a query is attempted:

oyvindhagberg commented 1 year ago

Very good work here, @terjekv! Looks like we should move to case sensitive fields sooner rather than later.

oyvindkolbu commented 1 year ago

As nearly everything in DNS is case insenstive moving things to case sensitive fields will be painfull. The idea to use lower() in on strings before matches is much more work than you think.

Also the idea of case sensitive emails is just there in theory. No (sensible) server will allow both User@example.org and user@example.org. And remember that user@Example.org and user@example.org are equivalent as the domain part of an email address is case insensitive.

terjekv commented 1 year ago

The main solution for case-insensitive user input matching would be to inject a manager for the searches. Something down the route of this:

class CaseInsensitiveNameManager(models.Manager):
    def get_queryset(self):
        return super().get_queryset().annotate(
            lower_name=models.functions.Lower('name'),
        )

    def filter(self, *args, **kwargs):
        kwargs = {k.replace('name', 'lower_name'): v for k, v in kwargs.items()}
        return super().filter(*args, **kwargs)

I think that's the cleanest solution? For DNS we probably have to do something like this, unless we want to put the burden onto users, which would be cruel. Then again, this entire logic gets painful when doing IDNA in general, especially if we are realistically supporting ThaiURL or similar schemes. Meh.

For email fields, there is the annoyance that as much as case sensitive local parts are a theoretical possibility, it is in the RFC standard, so it feels pretty iffy to pretend it can't happen. Even parsing an email is painful, so I'm not sure what to do, unless we choose to allow the way the user inputs the email address to be used as-is, and let the email servers sort it out.

If we wish for the email entries to be authoritatively tied to a specific contact (ie, we want hostmaster@domain to have hostmaster@domain.com and no other email address), we should maybe consider adding a contact model and associate contacts to relevant entries and resolve emails from said contact.