openedx / platform-roadmap

Tracking the maintenance, enhancement, and advancement of the Open edX project.
11 stars 0 forks source link

Migrate ModuleStore off of MongoDB #3

Open e0d opened 2 years ago

e0d commented 2 years ago

TLDR;

Mongo DB introduces excessive hosting and maintenance cost for the value that we derive from in on the platform. With improvements to serializing courses in the RDBMS, it's even less valuable. Removing it will let us reduce cost and maintenance burden, scale down better, and thereby scale platform adoption.

The full details of this are here.

Related PRs:

ormsbee commented 2 years ago

Worth noting that this work cannot be fully completed until Old Mongo support is removed (DEPR-58).

ormsbee commented 2 years ago

Posted this on the forums:

To give a little more context, there are a few independent pieces of deprecation that need to happen before MongoDB can be removed:

Old Mongo Removal

Remove Old Mongo support entirely. @mikix done great work recently in cutting off access to Old Mongo courses:

What remains is a lot of code deletion and test fixing. This is an area where people can contribute with relatively little ramp-up, since it's mostly deleting test permutations. Please comment here if you're interested in that work.

Convert the Split Modulestore to use django-storages

This will require three parallelizable streams of work:

  1. Converting how we store Structure documents (the skeleton outline of a course) from MongoDB to django-storages, and migration scripts. This may also require some porting of the structures.py cleanup script, depending on the projected costs.
  2. Converting how we store Definition documents (the content of each block) from MongoDB to django-storages, and migration scripts. Because of access patterns and the variability of latency with object stores like S3, this would likely require an improved caching layer as well.
  3. Converting how we store store course static assets (e.g. images, PDFs) to django-storages. We have to take care here to (a) not break CDN caching for certain assets; and (b) not break security restrictions for enrollment-restricted assets. The way we ship over static assets right now is implemented as middleware and is frankly kinda wacky, so there will likely be more cleanup here than appears at first glance.

Remove MongoDB usage from Forums

Once the two sections above this are finished, it's possible to have a basic install of Open edX without MongoDB. The last piece that I know of that actively uses MongoDB is the forums experience. I don't know what the current plans for deprecation of this usage is. The last I recall talking with anyone about it, the general idea was that we wanted to switch away from MongoDB and towards the Django ORM, but only after removing the Ruby code. But again, I'm not sure where that stands now.

At the very least, if the other sections are completed, MongoDB can be a dependency of only the forums, and not Open edX as a whole.

regisb commented 10 months ago

Can I ask what is the status of this issue? If I understand correctly, step 2 "Convert the Split Modulestore to use django-storages" is not yet completed, right?