quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.73k stars 305 forks source link

Different math behavior in HTML and DOCX #10305

Open maxdrohde opened 1 month ago

maxdrohde commented 1 month ago

Bug description

It seems like DOCX files need math to be surrounded by `$$ ... $$ while HTML documents do not. It is counterintuitive to me that the way Quarto expects math to be written depends on the output.

Steps to reproduce

---
title: "Math Quarto test"
format:
  docx:
    toc: false
  html:
    toc: false

---

\begin{align*}
  \text{Equation 1}
\end{align*}

$$
\begin{align*}
  \text{Equation 2}
\end{align*}
$$

Expected behavior

I expect that both equations will display in the HTML and DOCX outputs.

Actual behavior

Only the HTML displays both equations. The DOCX only displays the equation when surrounded by $$ ... $$.

HTML

Screenshot 2024-07-16 at 16 54 25

DOCX

Screenshot 2024-07-16 at 16 54 39

Your environment

Quarto check output

Quarto 1.5.54
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.2.0: OK
      Dart Sass version 1.70.0: OK
      Deno version 1.41.0: OK
      Typst version 0.11.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.5.54
      Path: /Applications/quarto/bin

[✓] Checking tools....................OK
      TinyTeX: (not installed)
      Chromium: (not installed)

[✓] Checking LaTeX....................OK
      Using: Installation From Path
      Path: /Library/TeX/texbin
      Version: 2023

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.11.3
      Path: /Library/Frameworks/Python.framework/Versions/3.11/bin/python3
      Jupyter: 5.3.0
      Kernels: ir, julia-1.6, julia-1.8, julia-1.9, juliapro_v1.5.0-1-1.5, julia-1.5, julia-12-threads-1.8, python3

[✓] Checking Jupyter engine render....OK

[✓] Checking R installation...........OK
      Version: 4.4.0
      Path: /Library/Frameworks/R.framework/Resources
      LibPaths:
        - /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library
      knitr: 1.47
      rmarkdown: 2.27

[✓] Checking Knitr engine render......OK
mcanouil commented 1 month ago

This is not a bug. It's Pandoc guessing it's math and doing the right thing for HTML.

Equations should be surrounded by dollars. That is the recommended syntax in both Pandoc and Quarto. I am pretty sure, there is nowhere in Quarto documentation where equations are showed without dollars.

mcanouil commented 1 month ago

If I may, why are you using implicit maths?

maxdrohde commented 1 month ago

Thank you, that is good to know. I thought that Pandoc / Quarto automatically recognized LaTeX math environments in all formats but I will use the $$ syntax going forward.

cscheid commented 1 month ago

I wish we could offer an error message on the first usage.

As @mcanouil said, Quarto doesn't support that syntax. I'm not sure exactly how that's working in HTML either, but that is the bug.

rgaiacs commented 1 month ago

A few more details. Given the minimal working example provided in the issue description,

$ cat mwe.md 
\begin{align*}
  \text{Equation 1}
\end{align*}

$$
\begin{align*}
  \text{Equation 2}
\end{align*}

Pandoc will behave different given some settings.

Plain HTML

$ pandoc --from markdown --to html mwe.md produces a warning

[WARNING] Could not convert TeX math 
  \begin{align*}
    \text{Equation 2}
  \end{align*}
  , rendering as TeX

and the HTML has a empty paragraph where equation 1 should be

<p></p>
<p><span class="math display">$$
\begin{align*}
  \text{Equation 2}
\end{align*}
$$</span></p>

HTML With MathJax

pandoc --from markdown --to html --mathjax mwe.md produces

<p><span class="math display">\[\begin{align*}
  \text{Equation 1}
\end{align*}\]</span></p>
<p><span class="math display">\[
\begin{align*}
  \text{Equation 2}
\end{align*}
\]</span></p>

This is because, as present in MathJax documentation,

Note that, as opposed to true LaTeX, MathJax processes all environments when wrapped inside math delimiters, even those like \begin{equation}...\end{equation} that are supposed to be used to initiate math mode. By default, MathJax will also render all environments outside of delimiters, e.g., \begin{matrix}...\end{matrix} would be processed even if it is not in math mode delimiters, though you are encouraged to use proper delimiters for these cases to make your files more compatible with actual LaTeX. This functionality can be controlled via the processEnvironments option in the tex configuration options.

The --mathjax option does not influence DOCX.

cderv commented 1 month ago

I'm not sure exactly how that's working in HTML either,

@cscheid I was going to explain but @rgaiacs was quicker. This is indeed about Mathajx, which is a third party in the equation here. Mathjax support the syntax, and so Pandoc does support it too

> quarto pandoc --from markdown --to html --mathjax index.qmd
<p><span class="math display">\[\begin{align*}
  \text{Equation 1}
\end{align*}\]</span></p>
<p><span class="math display">\[
\begin{align*}
  \text{Equation 2}
\end{align*}
\]</span></p>

Pandoc does special processing of latex RawBlock when Math is active and it supports latex environment https://github.com/jgm/pandoc/blob/8613aa39e8b5b8282f30b9fd48b04ec2595ec8eb/src/Text/Pandoc/Writers/HTML.hs#L920-L932

So I don't know if we can (or even should) so something about this. Our documentation clearly states that using $ and $$ are the valid math syntax : https://quarto.org/docs/authoring/markdown-basics.html#equations

Mathjax has other exception like specific LaTeX command support, that won't be working in other context. For example using \color

---
title: "Math Quarto test"
format:
  docx:
    toc: false
  html:
    toc: false
---

$$
{\color{red}{x}} = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}
$$
HTML DOCX
![image](https://github.com/user-attachments/assets/0341908c-e343-47dd-a3c6-f34b083c76a6) ![image](https://github.com/user-attachments/assets/b63ef395-b6c1-48de-91ca-27f254701a06)

Pandoc throws an error message

[WARNING] Could not convert TeX math 
  {\color{red}{x}} = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}
  , rendering as TeX:

         ^
  unexpected control sequence \color
  expecting "%", "\\label", "\\tag", "\\nonumber", whitespace or "\\allowbreak"
Output created: index.docx

This is where conditional content based on formats needs to be used for example.

So there will be difference in Math across format, especially when third party tool specific to format (like mathjax or katex) are used.

I don't think this issue is a bug, and would not know what we could do. (except explaining how math are rendered for each format, and it could have some specificities 🤷 )

mcanouil commented 1 month ago

IMO as we state to use dollars for equations[^1] (which is the way to go for cross-format "code" thus why only this is documented), if a user decide to use math without the dollars it means it knows that it's something specific to MathJax and basically opt-in to that behaviour. I don't think we should document the usage without dollars (even as a warning/note) even if MathJax is the default for HTML. I believe it will lead to more confusion to users that did not even know not using dollars was possible in MathJax.

[^1]: Note that Pandoc also does not show a math usage without dollars.

cscheid commented 1 month ago

Ok. I think we're all in agreement, but I stand by this earlier point I made:

I wish we could offer an error message on the first usage.

I understand that it's Markdown/Pandoc policy to never offer syntax errors, but I think it is a serious design mistake. Syntax errors are great. They're a way for software to communicate to users that the software's expectation isn't being met. We can provide helpful error messages. Otherwise, we end up with the current (non-)bug report and understandable confusion.

Mathjax support the syntax, and so Pandoc does support it too

That's the design mistake. (I originally called it a bug, but I should have said "problem")

We're not going to be able to fix this anytime soon, but this is something we could do when we have linting.

cderv commented 1 month ago

Thanks for clarifying ! I now understand better considering the Linting context. Quarto can indeed take over and do better on this.