mitodl / mitx-grading-library

The MITx Grading Library, a python grading library for edX
https://edge.edx.org/courses/course-v1:MITx+grading-library+examples/
BSD 3-Clause "New" or "Revised" License
14 stars 9 forks source link

Import time optimization to prevent formatting error. #320

Closed drhodes closed 1 year ago

drhodes commented 1 year ago

Hi, the 18x courses on edx have been experiencing an intermittent HTML format error that prevents learners from viewing entire exercises. This happens anytime the codejail runs for longer than 3 seconds. Sometimes the error happens during timed final exams, which can be .. awkward. Anyways, I've been exploring mitigations. The one I'm currently presenting doesn't require course staff to alter grading scripts, and it doesn't require coordinating with the codejail team to optimize anything on their end.

I did some profiling on the mitx-grading-library and found that scipy was causing a large delay on import. Thankfully, scipy accounts for a vanishing portion of the code base. Here are some numbers for python3.7. Profiling was done with and without pycache present, and then also with and without lazy scipy imports. The times are rough averages.

| python version | __pycache__ available? | lazy import scipy? | import time (ms) | grader runtime (ms) |
|----------------+------------------------+--------------------+------------------+---------------------|
| python 3.7.15  | no                     | no                 |             1300 |             1400    |
| python 3.7.15  | yes                    | no                 |              365 |              430    |
| python 3.7.15  | no                     | yes                |              530 |              580    |
| python 3.7.15  | yes                    | yes                |              157 |              220    |

As far as I can tell, codejail does not use pycache. By lazily importing scipy, I would expect the grader script runtime (at least for this one example) would drop from 1400ms down to about 580ms on edx and would be one big step towards preventing the HTML format error. I've made the optimization, for anyone interested the commit can be found here. It's a small change and it doesn't break any tests with python 3.7.15. I haven't tested with 2.7 because honestly, I can't tell if 2.7 is still used anywhere .. and when I did try pip installing the 2.7 requirements in a venv, numpy was complaining. So, I figured I'd get some feedback first.

thoughts?

jolyonb commented 1 year ago

Nice work! I agree that loading scipy can be skipped for the vast majority of use cases. The cases you addressed were the three that I thought of - integration, factorial and some fancy matrix sampling. I'm happy to merge this into the publication version. Please make a PR against the base repo and we'll take it from there :)

drhodes commented 1 year ago

Hi Jolyon, thanks for the prompt review!

jolyonb commented 1 year ago

Addressed in #321.