Open ivanistheone opened 4 years ago
An alternative way to store the "cache" of generated persues files --- let's just add a new table:
class GeneratedExerciseFileCache(models.Model):
"""
A lookup table to avoid re-generating exercise export formats (perseus specifically).
If the current `md5(exercise_data)` of an exercise node matches the
`exercise_data_hash` of some row in this table, reuse the perseus .zip file `file`.
"""
# id = autoincrementing int
exercise_data_hash = models.CharField(max_length=400, blank=True, db_index=True)
file = models.ForeignKey('File', null=True, blank=True, related_name='_')
created = models.DateTimeField(auto_now_add=True, verbose_name=_("created"))
Much simpler solution to add a exercise_data_changed
(bool) field on ContentNode
(only relevant for kind=exercise). Credit @kollivier
exercise_data_changed
not considered)create_perseus_zip
which
exercise_data_changed=False
exercise_data_changed
is False so OK to reuse cached fileexercise_data_changed
is True so NOT OK to reuse cached filecreate_perseus_zip
which
exercise_data_changed=False
exercise_data_changed=True
every time an exercise or exercise question is editedSee more recent discussion on Studio here: https://github.com/learningequality/studio/issues/1982
Description
Running Khan Academy
zh-CN
andit
with nearly identical source trees resulted in all the exercises being marked as different and having to re-import them.What I Did
Expected
Only one or two nodes to have changed (the topics nodes modified).
Actual
Every single exercise had changed and required re-importing.
Possible causes
I suspect two possible cause:
this line could be the cause: https://github.com/learningequality/ricecooker/blob/master/ricecooker/classes/questions.py#L275 and since running on Python 3.5 the dict orders are not guaranteed to be consistent between different runs.
The other possible cause is the
.perseus
file generation on Studio could be non-deterministic, see https://github.com/learningequality/studio/blob/develop/contentcuration/contentcuration/utils/publish.py#L342-L359 which uses create predictable zip code code slightly different from the predictable zip used in riececooker.Real life consequences
Khan Academy channel users will need to redownload many files (small but still many) every time a new channel is published, even though exercises are substantially the same.