numbas / Numbas

A completely browser-based e-assessment/e-learning system, with an emphasis on mathematics
http://www.numbas.org.uk
Apache License 2.0
197 stars 118 forks source link

Look at whether suspend data can be made more efficient #1080

Open christianp opened 4 months ago

christianp commented 4 months ago

Somebody encountered SCORM's limit of 64,000 characters on the suspend data. The LTI provider doesn't enforce this, but generic SCORM players do.

We should try to make the suspend data as small as possible, so that as many exams as possible will work. Obviously there has to be an upper limit on the size of an exam - an exam with 64,001 parts could never fit in 64,000 characters, as a simple upper bound.

At the moment, lots of values are included even if they have the default value. We could try only including keys if they have a non-default value.

Going even further, a lot of space is taken up by the names of object keys. If we use something other than JSON, then we could either assume we know the shape of the data and omit the keys entirely, or give a structure definition at the start.

christianp commented 4 months ago

I wrote some code to chuck out keys from suspend data objects when they have default values. That seems to have cut the size of the suspend data roughly in half, since most questions don't use most features.

However, running the JSON through gzip and then base64-encoding it further cut down the data to between 10% and 20% of the original. This would save a lot of space, at the cost of not being able to read the suspend data directly.

christianp commented 4 months ago

The problem is the CompressionStream is recent, and only at about 80% support at the moment. I tried using lz-string.js, but base64 -encoding only halves the size, and I'm not sure UTF-16 encoding is safe.

billy-woods commented 2 months ago

Hi Christian. It was me that ran into this issue. I certainly don't have any big fixes, but I do have a number of small questions and/or thoughts that might help shrink the suspend data a meaningful amount, at least for the sorts of questions I code, and probably for others too:

  1. Does the scorm suspend data need to contain "auxiliary" variables that aren't randomised? e.g. if a = random(1..10), and b = 5*a^2, and c = 12, can you get away without storing c, and possibly not even storing b?
  2. Similar to the above: does the scorm suspend data need to contain the whole of the advice section for each question? This can take up a huge amount of space, and is presumably just a large string with no inherent randomisation.
  3. I'm still getting integers stored in some strange ways: "factorsa":"[ imprecise(2), imprecise(2), imprecise(2), imprecise(2), imprecise(2) ]" (though not very often)
  4. Here's one example (from a single question!) where just trimming the spaces and changing true/false to 1/0 would save over 50% of the space: [{"exec_path":"","studentAnswer":"[ [ [ true ], [ false ], [ false ] ], [ [ false ], [ false ], [ true ] ], [ [ false ], [ true ], [ false ] ], [ [ true ], [ false ], [ false ] ], [ [ false ], [ true ], [ false ] ], [ [ false ], [ false ], [ true ] ], [ [ true ], [ false ], [ false ] ], [ [ true ], [ false ], [ false ] ], [ [ false ], [ false ], [ true ] ] ]","results":[]},{"exec_path":"","studentAnswer":"[ [ [ true ], [ false ], [ false ] ], [ [ true ], [ false ], [ false ] ], [ [ false ], [ true ], [ false ] ], [ [ true ], [ false ], [ false ] ], [ [ false ], [ true ], [ false ] ], [ [ false ], [ false ], [ true ] ], [ [ true ], [ false ], [ false ] ], [ [ true ], [ false ], [ false ] ], [ [ false ], [ false ], [ true ] ] ]","results":[]},{"exec_path":"","studentAnswer":"[ [ [ false ], [ false ], [ true ] ], [ [ true ], [ false ], [ false ] ], [ [ false ], [ true ], [ false ] ], [ [ false ], [ false ], [ true ] ], [ [ false ], [ true ], [ false ] ], [ [ true ], [ false ], [ false ] ], [ [ false ], [ false ], [ true ] ], [ [ false ], [ false ], [ true ] ], [ [ true ], [ false ], [ false ] ] ]","results":[]},{"exec_path":"","studentAnswer":"[ [ [ false ], [ true ], [ false ] ], [ [ true ], [ false ], [ false ] ], [ [ false ], [ true ], [ false ] ], [ [ false ], [ true ], [ false ] ], [ [ false ], [ true ], [ false ] ], [ [ true ], [ false ], [ false ] ], [ [ false ], [ true ], [ false ] ], [ [ false ], [ true ], [ false ] ], [ [ true ], [ false ], [ false ] ] ]","results":[]}] In fact, the word false appears over 300 times in the suspend data for this question (mostly not as part of student answers!), and if that could be changed to 0, it would save 1200 characters.
christianp commented 2 months ago
  1. It should already be the case that only variables which are sources of randomisation are saved. If you have an example of that not happening, please show me.
  2. The advice text isn't saved in the suspend data. What do you mean?
  3. Can you show me a question that does this?
  4. JME is strongly typed, so true is not identical to 1. We could certainly look at representing the answers to multiple choice questions in a specialised format, rather than just the JME representation which is currently used. But I think that entire list can be omitted: it's the cache of pre-submit results, which includes the student's answer, but there are no results. Can you give me a link to the question made this, please?