grnet / djnro

DjNRO hits the decks of eduroam database management
http://djnro.grnet.gr/
Other
10 stars 22 forks source link

Provide MySQL exact lookup with binary collation for some fields #72

Closed zmousm closed 4 years ago

zmousm commented 4 years ago

Per the Django and MySQL documentation, the default collation for a database (and all tables and columns) should be utf8_general_ci and might be tweaked to utf8_unicode_ci. Nowadays the utf8mb4 character set (and the respective collations) is also common, yet it is still not recommended by default in Django for a number of reasons.

Regardless what charset is used, these collations do not yield accurate results when filtering by a value that contains diacritics, for example "Művészeti" will also return "Mûvészeti". Changing the collation at the database/table/column level is not really an option, as it introduces a number of different, more common problems, while it is not handled by Django. Ditto for overriding it at the connection level (collation_connection), for somewhat different reasons, while this is also harder to properly support in Django.

The one place where this really matters is in parse_institution_xml, while parsing EDB1 data and matching against existing ServiceLoc objects.

vladimir-mencl-eresearch commented 4 years ago

Hi @zmousm , what will this code do for deployments using other DBMS ? E.g., we use PostgreSQL - but in this code, I don't see anything that would make it apply only for MySQL ... or am I not looking right?

zmousm commented 4 years ago

Hi @zmousm , what will this code do for deployments using other DBMS ? E.g., we use PostgreSQL - but in this code, I don't see anything that would make it apply only for MySQL ... or am I not looking right?

It will do nothing for postgres as it only overrides (provides) as_mysql. See:

When compiling a query, Django first looks for as_%s % connection.vendor methods, and then falls back to as_sql.

This issue does not apply to postgres (AFAIK).

vladimir-mencl-eresearch commented 4 years ago

Ah, thanks!