openwebwork / webwork2

Course management front end for WeBWorK
http://webwork.maa.org/wiki/Main_Page
Other
146 stars 165 forks source link

ASCIIMath MathJax delimiter #1980

Closed Alex-Jordan closed 8 months ago

Alex-Jordan commented 1 year ago

I noticed some faculty here have written problems where they use bare backticks to enter math in a PGML block. This is only working because their math is simple enough to work as ASCIIMath, and MathJax intercepts the backticks and is configured to process ASCIIMath input.

Meanwhile the instructors are unaware that this breaks their PDF production.

Should we do something about this to make it more clear to a problem author they are doing something wrong? PGML could see the backticks and escape them, maybe. Or we could change the Mathjax config, as long as the construct [: math :] still works.

Alex-Jordan commented 1 year ago

I added:

asciimath: {delimiters: []},

to the MathJax config, ran npm ci, and now:

BEGIN_PGML
[`a/b`]

[:a/b:]

`a/b`
END_PGML

comes out as expected, with the last option not being rendered math.

drgrice1 commented 1 year ago

There is a problem with this in that ASCIIMath has also been used for essay type questions (and a few other places) for allowing entry of formulas that are typeset.

somiaj commented 1 year ago

I have run across a similar issue with using $variable = '\( math \)', then using [$variable] in PGML. Since MathJax picks up the \(...\), the math formats just fine in HTML output, but the hardcopy fails to render the math because all of the backslashes get escaped. In this case the fix is to use [$variable]* to not escape the backslashes.

Is there a way PGML could escape the deliminators for both ASCIIMath and variable output so MathJax won't pick them up?

Alex-Jordan commented 1 year ago

Ah OK. I was wondering if there was a consideration like that.

Is there a way PGML could escape the deliminators for both ASCIIMath and variable output so MathJax won't pick them up?

I've been thinking about this too and each idea I had so far has a problem. If you have ideas though, let's talk them through.

How widespread is the understanding you can use backticks in essay type questions for ASCII Math? What if the MathJax ASCIIMath delimiters were changed to [: ... :] to match PGML? Would people rebel at having to do it that way?

drgrice1 commented 1 year ago

I don't really use essay answers, and I don't know how widespread the understanding of backtick usage is.

somiaj commented 1 year ago

I just tested that the class tex2jax_ignore stops mathjax, so in a problem use $variable = '<span class="tex2jax_ignore">\(\frac{2x}{x^2 + 1}\)</span>', and then add [$variable]* to PGML and the math won't render. Seems this could be used to escape the deliminators.

somiaj commented 1 year ago

It also appears you can configure what class name(s) for mathjax to either ignore (or process).

somiaj commented 1 year ago

@Alex-Jordan Here is a patch that escapes \(, \[, \], \), and ` (the patch applies to PG, which is probably where this conversation should move).

--- a/macros/core/PGML.pl
+++ b/macros/core/PGML.pl
@@ -1366,6 +1366,7 @@ sub Escape {
        $string =~ s/</&lt;/g;
        $string =~ s/>/&gt;/g;
        $string =~ s/"/&quot;/g;
+       $string =~ s/(\\(|\\[|\\]|\\)|`)/<span class="tex2jax_ignore">$1<\/span>/g;
        return $string;
 }

This escapes those deliminators so MathJax won't pick them up. From my limited tests this makes the HTML and hardcopy output the same since MathJax isn't getting in the way. It also correctly works to not escape those deliminator in variables when appended with a *, such as [$var]*. One minor issue would be if someone did $var = '`x^2 + 2x + 5`' followed by [$var]*, which then gets us back to the original issue.

Alex-Jordan commented 1 year ago

Davide responded to my MathJax forum question about this: https://groups.google.com/g/mathjax-users/c/zmmEun8rG4I/m/CDpA6SvMDAAJ

He does give a configuration that could work, but it would have to be integrated with the current configuration and tested. Alternatively, he implies there'd be nothing wrong with printing backticks like <span class="tex2jax_ignore">`</span>. Actually I wonder if <span>`</span> is enough, since that probably (?) stops MathJax from pairing them up.

I'll see about making PGML wrap individual backticks in span.

somiaj commented 1 year ago

@Alex-Jordan Just modify my patch above, though I think I tried with just a span and it didn't work, and had to add the tex2jax_ignore class to make it work. I have modified the patch a little bit to escape a few more things, including back ticks and dollar signs (though I don't think those deliminators are configured).

diff --git a/macros/core/PGML.pl b/macros/core/PGML.pl
index 5165a8b1..d5b1bc03 100644
--- a/macros/core/PGML.pl
+++ b/macros/core/PGML.pl
@@ -1366,6 +1366,7 @@ sub Escape {
        $string =~ s/</&lt;/g;
        $string =~ s/>/&gt;/g;
        $string =~ s/"/&quot;/g;
+       $string =~ s/(\\(|\\[|\\]|\\)|`|\$+)/<span class="tex2jax_ignore">$1<\/span>/g;
        return $string;
 }
Alex-Jordan commented 1 year ago

I'll test, but I think that may break the existing PGML ``` delimiter for inline code. I am planning to parse out lone ` and turn them into <span>`</span>.

somiaj commented 1 year ago

My patch won't mess with any PGML deliminator, this is part of the escape function and only applies after any tokens are parsed by PGML. In essence, PGML is already escaping characters in different outputs, this just make it so HTML output now escapes mathjax deliminator's so they don't get picked up by mathjax (which seems to me the proper way to do this).

somiaj commented 1 year ago

Here is a test problem I've used to test various situations using that patch:

escape-mathjax-test.txt

Alex-Jordan commented 1 year ago

Oh I see.

If some actual math or code has a backtick in it, would escaping that in the output be a problem?

Alex-Jordan commented 1 year ago

I'm clearly hesitant to wrap a span around something and call that "escaping" it. The other things there are just doing HTML encoding, so I'm wary.

somiaj commented 1 year ago

Oops, discovered a small issue with my patch, it was turning \( into <span>\</span>( instead of <span>\(</span>. Here is an updated one, and you were correct, the class was not needed since the spans separate things enough.

diff --git a/macros/core/PGML.pl b/macros/core/PGML.pl
index 5165a8b1..85ad3324 100644
--- a/macros/core/PGML.pl
+++ b/macros/core/PGML.pl
@@ -1366,6 +1366,7 @@ sub Escape {
        $string =~ s/</&lt;/g;
        $string =~ s/>/&gt;/g;
        $string =~ s/"/&quot;/g;
+       $string =~ s/(\\\(|\\\[|\\\]|\\\)|`|\$+)/<span>$1<\/span>/g;
        return $string;
 }
drgrice1 commented 1 year ago

I'm clearly hesitant to wrap a span around something and call that "escaping" it. The other things there are just doing HTML encoding, so I'm wary.

I have been watching this conversation, and I also am wary of this approach for the same reason.

Alex-Jordan commented 1 year ago

I'm not coding right now, but what does your setup do with these?


BEGIN_PGML

Here is ``` code` ```.

Here is [|verbatim`|]*.

Here is [` some math` `].

END_PGML
somiaj commented 1 year ago

I agree that it is a strange way to escape a character to stop mathjax from picking up on it. I've tried other things like using &bsol; for a backslash, but that didn't stop mathjax from rendering it. I think I tried similar for the backtick, but might have stopped when backslash failed.

somiaj commented 1 year ago

I'm not coding right now, but what does your setup do with these?

This patch shouldn't mess with any of PGML normal actions, I just tested that block of code and things seem to be fine, there are a few extra span tags around some of the tick marks, but they don't mess with the output anyways. The main difference is my patch makes your ``` code` ``` line actually output what you see, while without this patch, MathJax kicks in due to the backtick deliminator.

I don't think this should get in the way of any current rendering because it is only the text that PGML outputs that gets escaped. Since PGML won't escape the remaining text until after it finds it deliminators and blocks, any PGML code with backticks will be dealt with normally.

Alex-Jordan commented 1 year ago

There's at least one thing that it breaks, which is use of $BM and $EM.

$x = "${BM}x${EM}";

BEGIN_PGML
[$BM] math [$EM]  
[$x]
END_PGML

since the substitution to \\(...\\) has already been made when Escape is applied. Of course it would be weird to literally do like these examples, but I use $BM and $EM occasionally to do special things. And I'm not yet persuaded that the Escape method won't mess something up that we haven't thought about.

I'd like to separate the backtick issue from the \( issue. PG is not intentionally producing backticks for any purpose, whereas it does intentionally produce \( as the math delimiter for HTML. I have a backtick-parsing PR ready to go that singles out backticks directly from the PGML input text and only wraps backticks in a span that came from the PGML input text, and leave alone the backticks from other places.

somiaj commented 1 year ago

Your example is exactly why I want to escape \( and don't think it should be separated. Because you are hiding an issue with the hardcopy. Here is what your example currently produces in html (note I put something in for math):

image

But if you create the hardcopy this is what you get:

image

The fact that it renders in HTML but doesn't in Hardcopy is make it so I cannot tell an issue is even there until I create the hardcopy. On the other hand if you escaped it you would see the issue in HTML and know the correct fix is to not escape the variables. Your code really needs to be the following to work correctly in both HTML and Hardcopy:

$x = "${BM}x${EM}";

BEGIN_PGML
[$BM]* math [$EM]*  
[$x]*
END_PGML

(Well [$BM]* math [$EM]* actually does break hardcopy but because math gets escaped (unless the math is simple and doesn't contain special characters), but the [$x]* works correctly.)

Alex-Jordan commented 1 year ago

Fair enough.

But you and I are addressing two different issues, as I see it. I would like to stop beginning authors who think using backticks is a legitimate way to enter math in PGML. Which to be fair, is understandable.

What you are addressing is people hacking around the PGML markup by storing things in variables. Someone using those techniques surely suspects they are doing something sneaky/hackish, or is that not the case?

somiaj commented 1 year ago

I don't think it is hackish to use \( ... \) in strings for math. A common problem technique I use for multiple choice, true false, matching, type problems is have an array of statements / answers that are a mix of math and non-math. I then take a random subset of those statements and turn them into a question. Also, putting math in strings is needed to add formulas to the radio buttons or check box parser, if you want math in the choices. Though I do forget about PTX output, I should be using $BM and $EM which ends up just being \( ... \) in the outputs I use as you showed above, so thanks for reminding me of that.

I do agree that the issues are separate but the underlying cause is the similar in how to keep mathJax from rendering code in certain situations, and any fix should probably find a way to deal with both cases in a reasonable way.

drdrew42 commented 1 year ago

Just a quick note in testing out @somiaj 's Escape() modifications.

Both of these work:

$x = "${BM}\frac{x^2}{8}${EM}";

BEGIN_PGML
[$BM]* \\frac{x^2}{8} [$EM]*  
[$x]*
END_PGML

One must apply their own escapes when using the separated & starred BM/EM approach...

Edit: I'm not sure how this plays with TeX output

somiaj commented 1 year ago

@drdrew42 Correct on your edit, hardcopy TeX output fails for the first example, because the TeX output escapes the raw string \\frac{x^2}{8}, which makes it so mathmode chokes on the escaped characters.

pstaabp commented 8 months ago

Fixed in openwebwork/pg#878