openedx / edx-platform

The Open edX LMS & Studio, powering education sites around the world!
https://openedx.org
GNU Affero General Public License v3.0
7.11k stars 3.8k forks source link

fix: reindex_studio was crashing if instance had too many courses #34905

Closed bradenmacdonald closed 1 month ago

bradenmacdonald commented 1 month ago

Description

This fixes https://github.com/openedx/modular-learning/issues/223 "Cannot create initial search index on instances with many courses".

The problem was that calling store.get_courses() would load too much data into memory at once.

To fix this, I changed the code to use CourseOverview to get the total course count, and to do a paginated query that loads only 1,000 course IDs (and names) at a time.

Supporting information

https://github.com/openedx/modular-learning/issues/223

Testing instructions

See instructions for enabling Studio Content Search at https://openedx.atlassian.net/wiki/spaces/COMM/pages/3890380898/Next+Release+Redwood+-+Operator+Dev+Notes and follow that procedure.

Deadline

None, but we'd like to backport this fix to Redwood.

Private ref: MNG-4278

openedx-webhooks commented 1 month ago

Thanks for the pull request, @bradenmacdonald! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

bradenmacdonald commented 1 month ago

@MoisesGSalas Does this fix the issue you were seeing?

MoisesGSalas commented 1 month ago

I will try to test it out between today and tomorrow.

MoisesGSalas commented 1 month ago

I tried running it again and it does index successfully. The only issue I'm seeing is that this is going to take a while, it has indexed 2500 courses in around 3 hours, so it will probably take two days to finish.

I'm assuming that kind of optimization is outside the scope of this PR?

bradenmacdonald commented 1 month ago

Yeah, for this PR I just want to get it working. Optimizations can be looked at separately. Thanks for testing it!

openedx-webhooks commented 1 month ago

@bradenmacdonald 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

edx-pipeline-bot commented 1 month ago

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

edx-pipeline-bot commented 1 month ago

2U Release Notice: This PR has been deployed to the edX production environment.