The function would be used to avoid proposing the (visually) same answer twice.
The heuristic could be the following : every answer will be converted to a canonic form (which may not be valid LaTeX code), used for comparison.
This function will of course give false negatives, but it will still be better than cuurent basic LaTeX code comparison.
The canonic form would be generated by applying the following rules :
seems safe : \,\;\.\quad\qquad -> space
seems safe : several spaces or tabs -> one space : s = re.sub("\s+", " ", s)
not so safe : recursively : {} -> empty string. This one may lead to false positives though : \command{}text -> \commandtext.
Should we accept the risk, or take more precautions (test if there is a command r"\\\w+ before {})?
Note that false positives or negatives are not very damaging as long as the error message is very explicit and provide tips to fix it.
The function would be used to avoid proposing the (visually) same answer twice.
The heuristic could be the following : every answer will be converted to a canonic form (which may not be valid LaTeX code), used for comparison. This function will of course give false negatives, but it will still be better than cuurent basic LaTeX code comparison.
The canonic form would be generated by applying the following rules :
\,
\;
\.
\quad
\qquad
-> spaces = re.sub("\s+", " ", s)
{}
-> empty string. This one may lead to false positives though :\command{}text
->\commandtext
. Should we accept the risk, or take more precautions (test if there is a commandr"\\\w+
before{}
)?Note that false positives or negatives are not very damaging as long as the error message is very explicit and provide tips to fix it.