mathjax / MathJax

Beautiful and accessible math in all browsers
http://www.mathjax.org/
Apache License 2.0
10.23k stars 1.16k forks source link

Order of Super/Subscription causes parsing failure #3296

Closed sorenchiron closed 1 month ago

sorenchiron commented 1 month ago

Issue Summary

Using math superscription causes parsing failure Chrome, Win11, Markdown, mathjax@3/es5

Experiment shows that a totally correct working latex math expression can become failure with a single swith of the super-subscription order.

Steps to Reproduce:

create the following markdown with math expressions

$$
\begin{align}
x_t &= \sqrt{\alpha_t} x_{t-1} + \sqrt{1-\alpha_t} \epsilon_{t-1}^{\ast} \\\\
&= \sqrt{\alpha_t} \left( \sqrt{a_{t-1}} x_{t-2}  + \sqrt{1-\alpha_{t-1}} \epsilon^{\ast}_{t-2} \right) + \sqrt{1 - \alpha_t}  \epsilon^{\ast}_{t-1}
\end{align} 
$$

<script>
window.MathJax = {
  tex: {
    inlineMath: [['$','$'], ['\\(','\\)']],
    displayMath: [             // start/end delimiter pairs for display math
      ['$$', '$$'],
      ['\\[', '\\]']
    ],
    processEscapes: true,
    processEnvironments: true, // process \begin{xxx}...\end{xxx} outside math mode
    processRefs: true,         // process \ref{...} outside of math mode
    digits: /^(?:[0-9]+(?:\{,\}[0-9]{3})*(?:\.[0-9]*)?|\.[0-9]+)/,
    tags: 'all',              // or 'ams' or 'all'
    maxMacros: 10000,          // maximum number of macro substitutions per expression
    formatError:               // function called when TeX syntax errors occur
        (jax, err) => jax.formatError(err)
  },

  loader: {
    load: ['input/asciimath', ]
  }
};
</script>
<script src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js" id="MathJax-script"></script>

The failure html output is plain text:

$$ \begin{align} x_t &= \sqrt{\alpha_t} x_{t-1} + \sqrt{1-\alpha_t} \epsilon_{t-1}^{\ast} \ &= \sqrt{\alpha_t} \left( \sqrt{a_{t-1}} x_{t-2} + \sqrt{1-\alpha_{t-1}} \epsilon^{\ast}{t-2} \right) + \sqrt{1 - \alpha_t} \epsilon^{\ast} \end{align} $$

If the order of subscription and superscription is swithed, everything will work:

By changing \epsilon^{\ast}_{t-2} to \epsilon_{t-2}^{\ast} the output is then correct.

Expected program behaviour: any order of super/subscription will be rendered correctly.

Technical details:

I am using the following MathJax configuration:

window.MathJax = {
  tex: {
    inlineMath: [['$','$'], ['\\(','\\)']],
    displayMath: [             // start/end delimiter pairs for display math
      ['$$', '$$'],
      ['\\[', '\\]']
    ],
    processEscapes: true,
    processEnvironments: true, // process \begin{xxx}...\end{xxx} outside math mode
    processRefs: true,         // process \ref{...} outside of math mode
    digits: /^(?:[0-9]+(?:\{,\}[0-9]{3})*(?:\.[0-9]*)?|\.[0-9]+)/,
    tags: 'all',              // or 'ams' or 'all'
    maxMacros: 10000,          // maximum number of macro substitutions per expression
    formatError:               // function called when TeX syntax errors occur
        (jax, err) => jax.formatError(err)
  },

  loader: {
    load: ['input/asciimath', ]
  }
};

and loading MathJax via

<script src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js" id="MathJax-script"></script>

Supporting information:

--

dpvc commented 1 month ago

You don't say what Markdown engine you are using, but Markdown and LaTeX syntax don't always play nice together (see the section of the MathJax documentation on interactions with content-management systems for more details).

In particular, Markdown uses underscores (_) to indicate italics in some circumstances, and if there are two underscores in a math expression, that can cause Markdown to remove the underscores and insert <i> or <em> tags. Those tags will prevent MathJax from processing the expression, as math is not allowed to have internal HTML (with few exceptions). Since Markdown runs before MathJax, there is very little MathJax can do about that. Some Markdown engines know about LaTeX notation and are set up to avoid that problem, and others use special delimiters like $` ... `$ to resolve the problem, but most don't, and so you run into this problem.

Note that and underscore indicates the start of italics only if it is preceded by a word-break character. In your case, with \epsilon^{\ast}_{t-2}, the } before the _ is a word-break character, while in \epsilon_{t-2}^{\ast}, the n before the _ is not. That is why one case works for you and one doesn't.

If you look closely at the "failure" output that your provide, you will see

... \sqrt{1-\alpha_{t-1}} \epsilon^{\ast}{t-2} \right) ...

where the underscore is missing between {\ast} and {t-2}, indicating that Markdown has removed it. I suspect that the {t-2} and the material to the right of that is in italics in your HTML page.

One possible solution is to use \_ rather than _ in those cases where Markdown is inserting italics markers. Another would be to use `$...$` to have Markdown treat the math as code, and not process it further, and then configure MathJax to not skip <code> tags, as it usually does. That might have adverse affects if you are using code blocks for other reasons. Such a configuration would include

MathJax = {
  options: {
    skipHtmlTags: {'[-]': ['code']}
  }
}

So the upshot is that this is not a MathJax bug, it is an interaction between Markdown and MathJax.

sorenchiron commented 1 month ago

@dpvc Thank you very much for such detailed explaination! I can now totally understand. Sophasticated combinations of components in nowadays workflows may have put unexpected burden on developers, and confusions to users.