Closed terjekv closed 4 months ago
One solution if we want to keep the idea itself (a case-insensitive field) is to migrate to using collations: https://adamj.eu/tech/2023/02/23/migrate-django-postgresql-ci-fields-case-insensitive-collation/
However, an equally valid solution would be to use case-sensitive fields and ensure that we use lower()
or similar when we interact with the field. This would also allow us to use other database backends for testing (and production).
Also note that we use CI-fields for email addresses: https://github.com/unioslo/mreg/blob/806be0114579824c26e5abbff3c8a6c94c1e8bda/mreg/models.py#L116 https://github.com/unioslo/mreg/blob/806be0114579824c26e5abbff3c8a6c94c1e8bda/mreg/models.py#L357
However, https://datatracker.ietf.org/doc/html/rfc5321#section-2.4 states:
Verbs and argument values (e.g., "TO:" or "to:" in the RCPT command and extension name keywords) are not case sensitive, with the sole exception in this specification of a mailbox local-part (SMTP Extensions may explicitly specify case-sensitive elements). That is, a command verb, an argument value other than a mailbox local-part, and free form text MAY be encoded in upper case, lower case, or any mixture of upper and lower case with no impact on its meaning. The local-part of a mailbox MUST BE treated as case sensitive. Therefore, SMTP implementations MUST take care to preserve the case of mailbox local-parts. In particular, for some hosts, the user "smith" is different from the user "Smith". However, exploiting the case sensitivity of mailbox local-parts impedes interoperability and is discouraged. Mailbox domains follow normal DNS rules and are hence not case sensitive.
(My emphasis.)
It seems switching to case-sensitive fields is the cleanest solution. However, we must ensure that we don't break anything by doing so. Perhaps there was a reason a case-insensitive data type was chosen in the first place. But overall, it looks like these fields could be case-sensitive (like the email address).
This came up again due to #492. To have django-filter
automatically create filters and lookups for fields, one uses a structure such as:
class HostPolicyAtomFilterSet(rest_filters.FilterSet):
class Meta:
model = HostPolicyAtom
fields = "__all__"
The fields = "__all__"
-construct checks the field type and ensures that the appropriate lookups (exact
, contains
, gt
, etc) are made available for the field. However, for our LCI-fields, this automation breaks, seemingly as these field types are not known to django-filter. Specifying the lookup types manually works:
class HostPolicyAtomFilterSet(rest_filters.FilterSet):
class Meta:
model = HostPolicyAtom
fields = {
'name': ['exact', 'regex', 'contains'],
}
But, if we do this for one field, we have to do it for all fields in the model, so for this model:
class HostPolicyRole(HostPolicyComponent):
name = LCICharField(max_length=64, unique=True, validators=[_validate_role_name])
atoms = models.ManyToManyField(HostPolicyAtom, related_name='roles')
hosts = models.ManyToManyField(Host, related_name='hostpolicyroles')
labels = models.ManyToManyField(Label, blank=True, related_name='hostpolicyroles')
class Meta:
db_table = 'hostpolicy_role'
ordering = ('name',)
We would have to specify something like:
class HostPolicyRoleFilterSet(rest_filters.FilterSet):
class Meta:
model = HostPolicyRole
fields = {
'name': ['exact', 'regex', 'contains'],
'atoms__name': ['exact', 'regex', 'contains'],
'hosts__name': ['exact', 'regex', 'contains'],
'labels__name': ['exact', 'regex', 'contains'],
}
If we want other filters for other fields in related models, we'd have to be explicit about them.
Moving to case sensitive fields would also avoid issues such as these (from https://adamj.eu/tech/2023/02/23/migrate-django-postgresql-ci-fields-case-insensitive-collation/)
Some of Django’s lookups, like icontains, map to a SQL LIKE. PostgreSQL does not allow such queries with nondeterministic collations—you will see this error if such a query is attempted:
Very good work here, @terjekv! Looks like we should move to case sensitive fields sooner rather than later.
As nearly everything in DNS is case insenstive moving things to case sensitive fields will be painfull. The idea to use lower()
in on strings before matches is much more work than you think.
Also the idea of case sensitive emails is just there in theory. No (sensible) server will allow both User@example.org and user@example.org. And remember that user@Example.org and user@example.org are equivalent as the domain part of an email address is case insensitive.
The main solution for case-insensitive user input matching would be to inject a manager for the searches. Something down the route of this:
class CaseInsensitiveNameManager(models.Manager):
def get_queryset(self):
return super().get_queryset().annotate(
lower_name=models.functions.Lower('name'),
)
def filter(self, *args, **kwargs):
kwargs = {k.replace('name', 'lower_name'): v for k, v in kwargs.items()}
return super().filter(*args, **kwargs)
I think that's the cleanest solution? For DNS we probably have to do something like this, unless we want to put the burden onto users, which would be cruel. Then again, this entire logic gets painful when doing IDNA in general, especially if we are realistically supporting ThaiURL or similar schemes. Meh.
For email fields, there is the annoyance that as much as case sensitive local parts are a theoretical possibility, it is in the RFC standard, so it feels pretty iffy to pretend it can't happen. Even parsing an email is painful, so I'm not sure what to do, unless we choose to allow the way the user inputs the email address to be used as-is, and let the email servers sort it out.
If we wish for the email entries to be authoritatively tied to a specific contact (ie, we want hostmaster@domain to have hostmaster@domain.com and no other email address), we should maybe consider adding a contact model and associate contacts to relevant entries and resolve emails from said contact.
See https://adamj.eu/tech/2023/02/23/migrate-django-postgresql-ci-fields-case-insensitive-collation/ for more information.
hostpolicy.HostPolicyAtom.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. hostpolicy.HostPolicyRole.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.Cname.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.ForwardZone.email: (fields.W906) django.contrib.postgres.fields.CIEmailField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use EmailField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.ForwardZone.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.ForwardZone.primary_ns: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.ForwardZoneDelegation.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.Host.contact: (fields.W906) django.contrib.postgres.fields.CIEmailField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use EmailField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.Host.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.HostGroup.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.Label.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.Mx.mx: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.NameServer.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.Naptr.replacement: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.Naptr.service: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.ReverseZone.email: (fields.W906) django.contrib.postgres.fields.CIEmailField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use EmailField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.ReverseZone.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.ReverseZone.primary_ns: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.ReverseZoneDelegation.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead. mreg.Srv.name: (fields.W905) django.contrib.postgres.fields.CICharField is deprecated. Support for it (except in historical migrations) will be removed in Django 5.1. HINT: Use CharField(db_collation="…") with a case-insensitive non-deterministic collation instead.