Closed sm8ps closed 4 years ago
This is a difficult issue. The main reason why that particular test is being removed is that it is being executed on the PHP side which limits various things we would like to do. Mainly being able to generate a fully CAS evaluatable PRTs which could have true guard clauses (#150) without us having to evaluate each test separately. The other String tests will be mapped to CAS side string features, but for regexp there exists no direct match, Maxima does have some libraries that provide regexp but none of them seem to have the same syntax as preg_match.
For number spaces specifically, there exists some work that is ongoing related to #317 and the matching branch has quite a lot of Maxima functions for handling number bases in there. However, the work on that is currently waiting for 4.3 to be completed as that version will also change major parts of the relevant infrastructure.
If you only need hexadecimal then you might be able to do string matching "by hand" with something like the hexadecimal parsing function present in the JSON parser that we have for STACK, but for that you would need to cut out those prefixes and suffixes if those are in play.
So the reason for deprecation is that it limits the development of core functionality in a way that cannot be solved easily. The development objectives being blocked relate to performance and features benefiting in various areas and are thus being considered more important than regexp.
Personally, I would not be surprised if the regexp test would return at some point in the future, but if it does it will use one of those Maxima side regexp libraries which have different styles of syntax, but I would recommend sticking to 4.2.2 until that happens if you need that particular test. However, #317 might also solve your issue at some point, so might make sense to keep an eye on that.
preg_match
mainly as a potential way of handling certain background logic and even that logic has been replaced for 4.3 with something completely different. As a technical side note I personally would prefer to not expose regexp rule writing to question authors as many regexp systems have had security holes that can be triggered with suitably formed rules and inputs...One philosophical reason for removing this test is that we should "solve the problem at a mathematical level". In v4.3 we have a much more flexible parser. The solution to the problem you raise is to provide a parser context in which this is interpreted correctly at the outset.
Chris,
Well, for number bases such a parser at the edge makes sense and has been in the pipeline for quite some time, but we can never cover all use cases and therefore should aim to restore that regexp functionality at some point as it allows people to push the envelope and do things we did not expect. I do agree that regexps are not pure in that philosophical sense but I doubt anyone would benefit from too forceful enforcement of the pure philosophy.
The pure way would be to throw away string input entirely, but as I have spent years to get strings accepted I would be rather disapointed if we ever did so. As we have strings we might as well have tools for them so eventually regexps will come back in one form or another (probably using the sregex
Maxima package or some such).
Thanks all for explaining the problem behind the decision at great depth and for sharing your opinions! I fully understand the approach to have STACK follow Moodle Maxima's principles throughout. Nevertheless, I can well imagine that there exist good use cases "at the edge" where string handling comes in handy.
Looking back, I might have better titled the issue something like: "Is there a way to keep something like regex matching in future releases?" If sregex
is a better candidate than PHP-regexes then that should suit just fine.
For the particular case of hexadecimal conversion I shall keep an eye on that for sure. Thanks for pointing it out!
Edit 20191104: corrected wording mistake (Moodle/Maxima)
@sm8ps I am going to try and start getting my head around porting #317 to 4.3 beta sometime soon. STACK is crying out for proper base-n support - it broadens its usefulness so much. As well as using it for hexadecimal support I have been able to do a wide range of binary questions, including binary logic, signed types (S+M 1C, 2C), fixed and floating point types. I intend to package some of the functions I use up into a useful library. @sangwinc @aharjula regexps in some form are way too useful to throw away - in fact strings in general are mathematical entities - ask any crypotogrpaher. However, if we need to push the regex testing to STACK, either we could use sregex as suggested or I did find a common lisp implementation of PCRE... Also, removing preg_match is going to break a lot of my questions too.
@sgparry should we be able to have a PCRE syntax supporting lisp implementation that either came with Maxima or could be added to the STACK-Maxima libraries we could probably simply replace the current regexp test with it, maybe even skipping the whole deprecation. But unfortunately I have not had the time to find such a thing and the best I have found so far has been sregex which is not PCRE but comes with Maxima.
Should a drop in replacement (well from PHP to Maxima) PCRE implementation appear, it would at the same time mean that we should add the whole compiled PRT thing into 4.3 thus pushing 4.3 release even further, but it would surely clean up the code base quite a lot. In any case 4.3 will be a major release, no matter what the version numbering says.
It would be nice to have examples of the types of patterns people are testing with the regexp test as in the fields I have seen STACK being used that test has been completely skipped and thus I have no idea about what people actually do with it.
We have added a Maxima-based regular expression test as a response to issue #549. Discussion has moved to that issue.
At moodle-qtype_stack/doc/en/Authoring/Answer_tests.md it says with respect to preg_match(): "NOTE: we plan to remove this test in STACK version 4.3. Do not use this test."
I have some questions that do rely on this feature and am uncertain what I should do about them. E.g. students should express some numbers in hexadecimal notation. The numbers and answers are selected as strings at random from predefined lists. The student response is then compared as string to the solution. AFAIK, Maxima does not handle the conversion between number systems -- at least not easily.
This all works fine except that now some students prepend the hexadecimals by "0x" or append "_x" which is not demanded but not wrong either. Apart from defining additional solution lists, I could easily define some regular expressions for the verification process.
This certainly is a very small field of application, nevertheless I can imagine similar cases (#293 mentions one). So is it possible to have some kind of regular expression matching available in the future?