openedx / openedx-learning

GNU Affero General Public License v3.0
5 stars 11 forks source link

Add PostgreSQL support #255

Open ormsbee opened 2 weeks ago

ormsbee commented 2 weeks ago

This repo currently has a set of collation-related helpers that it uses to help normalize model collation behavior between MySQL and SQLite:

They manifest in the migrations files like this:

https://github.com/openedx/openedx-learning/blob/14284d265a9f344621ae81af8f3847e48b6c2215/openedx_learning/apps/authoring/publishing/migrations/0001_initial.py#L28-L29

The MultiCollationCharField is also used in one place in edx-platform

Anyone looking to add PostgreSQL support would need to look for all instances of MultiCollationCharField and MultiCollationTextField and put the postgres-specific mappings there. That includes the convenience methods like case_insensitive_char_field in fields.py, as well as in the individual migration files that have the sqlite/mysql arguments serialized out.

ormsbee commented 2 weeks ago

@qasimgulzar: These are some of the specifics that have to happen for this repo to support PostgreSQL. Thank you for taking this work on.

qasimgulzar commented 2 weeks ago

Thank you @ormsbee for sharing this with me. I am going to take care of it.

qasimgulzar commented 1 week ago

@ormsbee could you please add some details of the background of this field, e.g: why this field is there any why is it required to be MultiCollationCharField. It will be very helpful while implementing and testing the change.

qasimgulzar commented 1 week ago

https://github.com/user-attachments/assets/6f5e268b-f5c8-425c-88fa-39db7837c716

@ormsbee I have managed to run edx-platform with postgresql. I think it's going to bring some joy here.

ormsbee commented 1 week ago

There were two primary motivations for MultiCollationCharField:

1. To force utf8mb4 encoding, even in databases that were still configured for utf8mb3.

Up until Redwood, the Tutor default database connection was utf8, which in MySQL translated into their utf8mb3 encoding, which lacks support for emojis and certain other Unicode code points that require 4 bytes to encode in UTF8. Many people had already upgraded their instances through their own scripts and configuration, and some instances that didn't use Tutor were already connecting via utf8mb4.

In the middle of this weird, transitional period, we were creating new data models to store library (and eventually course) content, and we wanted to make sure that the generated tables could properly store 4-byte characters no matter what the database connection settings were for that particular instance at the time.

2. To normalize case sensitivity behavior for certain fields.

We have fields where case-sensitivity is important, like identifier keys. We've gotten into issues with this sort of mismatch in the past, where CourseOverviews store their course keys in a case insensitive manner, but Modulestore stores them in a case sensitive manner, making collision possible (e.g. two courses in Modulestore that map to the same CourseOverview). We also run many of our tests using in-memory sqlite, which is case sensitive by default–as opposed to MySQL's default case insensitivity.

Given that, it's convenient to have something like MultiCollationCharField in order to force consistent constraint-checking and ordering behavior between the two. That way, models can declare that, "this should be a case-insensitive char field" and get consistent behavior across databases, without the error-prone process of altering collations manually in the migration file.

qasimgulzar commented 3 days ago

Related Pull Requests

  1. edx-when
  2. edx-enterprise
  3. xblock-lti-consumer
  4. edx-milestones
  5. openedx-learning
  6. edx-platform