Open ormsbee opened 2 weeks ago
@qasimgulzar: These are some of the specifics that have to happen for this repo to support PostgreSQL. Thank you for taking this work on.
Thank you @ormsbee for sharing this with me. I am going to take care of it.
@ormsbee could you please add some details of the background of this field, e.g: why this field is there any why is it required to be MultiCollationCharField
. It will be very helpful while implementing and testing the change.
https://github.com/user-attachments/assets/6f5e268b-f5c8-425c-88fa-39db7837c716
@ormsbee I have managed to run edx-platform with postgresql. I think it's going to bring some joy here.
There were two primary motivations for MultiCollationCharField
:
utf8mb4
encoding, even in databases that were still configured for utf8mb3
.Up until Redwood, the Tutor default database connection was utf8
, which in MySQL translated into their utf8mb3
encoding, which lacks support for emojis and certain other Unicode code points that require 4 bytes to encode in UTF8. Many people had already upgraded their instances through their own scripts and configuration, and some instances that didn't use Tutor were already connecting via utf8mb4
.
In the middle of this weird, transitional period, we were creating new data models to store library (and eventually course) content, and we wanted to make sure that the generated tables could properly store 4-byte characters no matter what the database connection settings were for that particular instance at the time.
We have fields where case-sensitivity is important, like identifier keys. We've gotten into issues with this sort of mismatch in the past, where CourseOverviews store their course keys in a case insensitive manner, but Modulestore stores them in a case sensitive manner, making collision possible (e.g. two courses in Modulestore that map to the same CourseOverview). We also run many of our tests using in-memory sqlite, which is case sensitive by default–as opposed to MySQL's default case insensitivity.
Given that, it's convenient to have something like MultiCollationCharField
in order to force consistent constraint-checking and ordering behavior between the two. That way, models can declare that, "this should be a case-insensitive char field" and get consistent behavior across databases, without the error-prone process of altering collations manually in the migration file.
This repo currently has a set of collation-related helpers that it uses to help normalize model collation behavior between MySQL and SQLite:
They manifest in the migrations files like this:
https://github.com/openedx/openedx-learning/blob/14284d265a9f344621ae81af8f3847e48b6c2215/openedx_learning/apps/authoring/publishing/migrations/0001_initial.py#L28-L29
The
MultiCollationCharField
is also used in one place in edx-platformAnyone looking to add PostgreSQL support would need to look for all instances of
MultiCollationCharField
andMultiCollationTextField
and put the postgres-specific mappings there. That includes the convenience methods likecase_insensitive_char_field
in fields.py, as well as in the individual migration files that have the sqlite/mysql arguments serialized out.