maths / moodle-qtype_stack

Stack question type for Moodle
GNU General Public License v3.0
138 stars 147 forks source link

Timeouts for large Maxima variables #1152

Closed smmercuri closed 2 months ago

smmercuri commented 3 months ago

stackjson_stringify has quadratic growth in the size of a single Maxima variable (e.g., a two-dimensional array) which is causing timeouts. This can be a problem for statistics questions which involve generating datasets as two-dimensional arrays in Maxima.

aharjula commented 3 months ago

As long as we do not have "string builder" like tools in Maxima one has to wonder if the way to deal with this would be to use file-IO as that must surely be buffered and able to deal with this type of load. Basically, we could "simply" write out a file and read the end result back in. We already have the temp directory for the plots, so we have a place to write into, and many installs even use ramdisks for that directory, so disk wear or speed should not be a major issue. Although it would slow down small object handling it should lower the times for large objects quite significantly.

Basically, if we have a logic that can turn whatever we need into a sequence of string fragments then we can simply push those fragments out to a file or other stream and read them back if need be.

smmercuri commented 3 months ago

Indeed, this is one approach we have been thinking about. An issue here is that character output occurs out-of-order due to traversing Maxima trees (@sangwinc can correct me/add more here on this approach), but this would be a great approach for this if we can figure this out.

Another "quick-fix" approach that seems to help a little bit is to break the string containing the variable up into batches of length 64 (i.e., the maximum number of arguments allowed for Maxima functions), and pass these all to sconcat at once. This can be found here. I've tested this and appears to speed things up fairly significantly, but as we might expect it still becomes quadratic at some point. It is also failing 11 answer tests in mysterious ways, which I'm currently digging into.

aharjula commented 3 months ago

Please note that 64 is only a limit when running with GCL. Have you tried the whole thing with CLISP or SBCL? I have seen large data arrays, and they have not had that large of an impact on performance when running on SBCL. It would be a shame if this is just a GCL thing that, when fixed, makes others slower.