jgm / texmath

A Haskell library for converting LaTeX math to MathML.
GNU General Public License v2.0
312 stars 65 forks source link

Be able to fully parse docutils' mathematics.txt #202

Open infinity0 opened 1 year ago

infinity0 commented 1 year ago

On a discussion on different latex-mathml converters I found out about the following test file: https://docutils.sourceforge.io/docs/ref/rst/mathematics.txt

Pandoc can parse a lot of it, but gives the following errors:

$ pandoc  --mathml -f rst mathematics.txt 2>&1  >/dev/null  | grep "unexpected" | sort -u
  unexpected "\\"
  unexpected control sequence \arrowvert
  unexpected control sequence \Arrowvert
  unexpected control sequence \Bigl
  unexpected control sequence \bracevert
  unexpected control sequence \cfrac
  unexpected control sequence \circledS
  unexpected control sequence \diagdown
  unexpected control sequence \diagup
  unexpected control sequence \gggtr
  unexpected control sequence \idotsint
  unexpected control sequence \injlim
  unexpected control sequence \intop
  unexpected control sequence \llless
  unexpected control sequence \mspace
  unexpected control sequence \negmedspace
  unexpected control sequence \negthickspace
  unexpected control sequence \ngeqq
  unexpected control sequence \nleqq
  unexpected control sequence \nshortmid
  unexpected control sequence \nshortparallel
  unexpected control sequence \nsubseteqq
  unexpected control sequence \nsupseteqq
  unexpected control sequence \ointop
  unexpected control sequence \projlim
  unexpected control sequence \shortmid
  unexpected control sequence \shortparallel
  unexpected control sequence \smallint
  unexpected control sequence \surd
  unexpected control sequence \thickapprox
  unexpected control sequence \thicksim
  unexpected control sequence \underleftrightarrow
  unexpected control sequence \varinjlim
  unexpected control sequence \varliminf
  unexpected control sequence \varlimsup
  unexpected control sequence \varprojlim
  unexpected "x"

Many of these should just be a case of updating unimathsymbols.txt; some other things are a bit more complex such as spaces \.

jgm commented 1 year ago

What does your pandoc --version say? Are you using the latest version? I just tried it and got the following, which seems different -- for example, \mspace is handled fine.

[WARNING] Could not convert TeX math \underleftrightarrow{gbi}, rendering as TeX:
  underleftrightarrow{gbi}
                     ^
  unexpected control sequence \underleftrightarrow
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \arrowvert, rendering as TeX:
  \arrowvert
            ^
  unexpected control sequence \arrowvert
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \Arrowvert, rendering as TeX:
  \Arrowvert
            ^
  unexpected control sequence \Arrowvert
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \bracevert, rendering as TeX:
  \bracevert
            ^
  unexpected control sequence \bracevert
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \projlim, rendering as TeX:
  \projlim
          ^
  unexpected control sequence \projlim
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \injlim, rendering as TeX:
  \injlim
         ^
  unexpected control sequence \injlim
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \varlimsup, rendering as TeX:
  \varlimsup
            ^
  unexpected control sequence \varlimsup
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \varliminf, rendering as TeX:
  \varliminf
            ^
  unexpected control sequence \varliminf
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \varprojlim, rendering as TeX:
  \varprojlim
             ^
  unexpected control sequence \varprojlim
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \varinjlim, rendering as TeX:
  \varinjlim
            ^
  unexpected control sequence \varinjlim
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \circledS, rendering as TeX:
  \circledS
           ^
  unexpected control sequence \circledS
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \surd, rendering as TeX:
  \surd
       ^
  unexpected control sequence \surd
  expecting "%", "\\label", "\\tag", "\\nonumber", whitespace, "[", "!", "'", "''", "'''", "''''", "*", "+", ",", "-", ".", "/", ":", ":=", ";", "<", "=", ">", "?", "@", "~", "\\" or "{"
[WARNING] Could not convert TeX math \diagdown, rendering as TeX:
  \diagdown
           ^
  unexpected control sequence \diagdown
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \diagup, rendering as TeX:
  \diagup
         ^
  unexpected control sequence \diagup
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \ngeqq, rendering as TeX:
  \ngeqq
        ^
  unexpected control sequence \ngeqq
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \nleqq, rendering as TeX:
  \nleqq
        ^
  unexpected control sequence \nleqq
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \thickapprox, rendering as TeX:
  \thickapprox
              ^
  unexpected control sequence \thickapprox
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \thicksim, rendering as TeX:
  \thicksim
           ^
  unexpected control sequence \thicksim
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \llless, rendering as TeX:
  \llless
         ^
  unexpected control sequence \llless
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \gggtr, rendering as TeX:
  \gggtr
        ^
  unexpected control sequence \gggtr
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \shortmid, rendering as TeX:
  \shortmid
           ^
  unexpected control sequence \shortmid
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \shortparallel, rendering as TeX:
  \shortparallel
                ^
  unexpected control sequence \shortparallel
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \nshortmid, rendering as TeX:
  \nshortmid
            ^
  unexpected control sequence \nshortmid
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \nshortparallel, rendering as TeX:
  \nshortparallel
                 ^
  unexpected control sequence \nshortparallel
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \nsubseteqq, rendering as TeX:
  \nsubseteqq
             ^
  unexpected control sequence \nsubseteqq
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \nsupseteqq, rendering as TeX:
  \nsupseteqq
             ^
  unexpected control sequence \nsupseteqq
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \smallint, rendering as TeX:
  \smallint
           ^
  unexpected control sequence \smallint
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math 3\negmedspace 4, rendering as TeX:
  3\negmedspace 4
                ^
  unexpected control sequence \negmedspace
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math 3\negthickspace 4, rendering as TeX:
  3\negthickspace 4
                  ^
  unexpected control sequence \negthickspace
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math 3\hspace{1ex}4, rendering as TeX:
  3\hspace{1ex}4
            ^
  unexpected "x"
  expecting "em"
[WARNING] Could not convert TeX math \frac{\pi}{4} = 1 + \cfrac{1^2}{
  2 + \cfrac{3^2}{
  2 + \cfrac{5^2}{
  2 + \cfrac{7^2}{2 + \cdots}
  }}}
  \qquad \text{vs.}\qquad
  \frac{\pi}{4} = 1 + \frac{1^2}{
  2 + \frac{3^2}{
  2 + \frac{5^2}{
  2 + \frac{7^2}{2 + \cdots}
  }}}, rendering as TeX:
  pi}{4} = 1 + \cfrac{1^2}{
                     ^
  unexpected control sequence \cfrac
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \cfrac[l]{x}{x-1} \quad
  \cfrac{x}{x-1}    \quad
  \cfrac[r]{x}{x-1}, rendering as TeX:
  \cfrac[l]{x}{x-1} \quad
        ^
  unexpected control sequence \cfrac
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \displaystyle
  \Bigl(b\Bigr)
  \Bigl(\frac{c}
  {d}\Bigr), rendering as TeX:
  \Bigl(b\Bigr)
       ^
  unexpected control sequence \Bigl
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \left[\sum_i a_i\left\lvert\sum_j x_{ij}\right\rvert^p\right]^{1/p}
  \text{ versus }
  \biggl[\sum_i a_i\Bigl\lvert\sum_j x_{ij}\Bigr\rvert^p\biggr]^{1/p}, rendering as TeX:
  ggl[\sum_i a_i\Bigl\lvert\sum_j x_{ij}\B
                     ^
  unexpected control sequence \Bigl
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \Bigl(\begin{smallmatrix} a & b \\ c & d \end{smallmatrix}\Bigr), rendering as TeX:
  \Bigl(\begin{smallmatrix} a & b \\ c & d
       ^
  unexpected control sequence \Bigl
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \intop_0^1, rendering as TeX:
  \intop_0^1
        ^
  unexpected control sequence \intop
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \ointop_c, rendering as TeX:
  \ointop_c
         ^
  unexpected control sequence \ointop
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \intop_0^1 \quad \ointop_c
  \quad \text{vs.} \quad
  \int^1_0   \quad \oint_c, rendering as TeX:
  \intop_0^1 \quad \ointop_c
        ^
  unexpected control sequence \intop
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \begin{aligned}
  \left( 3                          \right)
  \left( f(x)                       \right)
  \left( \bar x                     \right)
  \left( \overline x                \right)
  \left( n_i                        \right) &= () \\
  \left( \underline x               \right) &= \bigl(\text{big}\bigr)\\
  \left( 3^2                        \right)
  \left( \sqrt{3}                   \right)
  \left( \sqrt{3^2}                 \right)
  \left( \sum                       \right)
  \left( \bigotimes                 \right)
  \left( \prod                      \right) &= \Bigl(\text{Big}\Bigr)\\
  \left( \frac{3  }{2}              \right)
  \left( \frac{3^2}{2^4}            \right)
  \binom{3  }{2}
  \begin{pmatrix} a & b \\ c & d \end{pmatrix}
  \left( \frac{1}{\sqrt 2}          \right)
  \left( \int                       \right)
  \left( \int_0                     \right)
  \left( \int^1                     \right)
  \left( \int_0^1                   \right) &= \biggl(\text{bigg}\biggr)\\
  \left( \frac{\sqrt 2}{2}          \right)
  \left( \sum_0                     \right)
  \left( \sum^1                     \right)
  \left( \sum_0^1                   \right)
  \left( \frac{\frac1x}{\frac{1}{n}}\right) &= \Biggl(\text{Bigg}\Biggr)\\
  \left( \intop_0                   \right)
  \left( \intop^1                   \right)
  \left( \intop_0^1                 \right)
  \end{aligned}, rendering as TeX:
          \right) &= \Bigl(\text{Big}\Bigr
                     ^
  unexpected "\\"
  expecting "&", "\\\\", white space or "\\end"
[WARNING] Could not convert TeX math \Bigl(\text{Big}\Bigr), rendering as TeX:
  \Bigl(\text{Big}\Bigr)
       ^
  unexpected control sequence \Bigl
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \left.\lgroup b\right\rgroup\ \bigl\lgroup b\Bigr\rgroup\ \biggl\lgroup b\Biggr\rgroup
  \quad
  \left.\lmoustache b\right\rmoustache\ \bigl\lmoustache b\Bigr\rmoustache\ \biggl\lmoustache b\Biggr\rmoustache
  \quad
  \left./b\right\backslash\ \bigl/b\Bigr\backslash\ \biggl/b\Biggr\backslash, rendering as TeX:
  roup b\right\rgroup\ \bigl\lgroup b\Bigr
                     ^
  unexpected "\\"
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \left.|b\right\|\ \bigl|b\Bigr\|\ \biggl|b\Biggr\|
  \quad
  \left.\vert b\right\Vert\ \bigl\vert b\Bigr\Vert\ \biggl\vert b\Biggr\Vert
  \quad
  \left.\arrowvert b\right\Arrowvert\ \bigl\arrowvert b\Bigr\Arrowvert\ \biggl\arrowvert b\Biggr\Arrowvert
  \quad
  \left.\bracevert b\right\bracevert\ \bigl\bracevert b\Bigr\bracevert\ \biggl\bracevert b\Biggr\bracevert
  \quad
  \left.\vert b\right\Vert\ \bigl\vert b\Bigr\Vert\ \biggl\vert b\Biggr\Vert, rendering as TeX:
  \left.\arrowvert b\right\Arrowvert\ \big
                   ^
  unexpected control sequence \arrowvert
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \int\ \iint\ \iiint\ \iiiint\ \idotsint\ \oint\ \smallint\
  \sum\ \prod\ \coprod\ \bigwedge\ \bigvee\ \bigcap\ \bigcup\
  \biguplus\ \bigsqcup\ \bigodot\ \bigoplus\ \bigotimes, rendering as TeX:
   \iiiint\ \idotsint\ \oint\ \smallint\
                     ^
  unexpected control sequence \idotsint
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \int_1 f\ \intop_1 f\ \iint_1 f\ \smallint_1 f\ \sum_1\
  \prod_1\ \bigwedge_1\ \bigcap_1\ \biguplus_1\ \bigodot_1\ \int^N\
  \intop^N\ \iiiint^N\ \oint^N\ \smallint^N\ \sum^N\ \coprod^N\
  \bigvee^N\ \bigcup^N\ \bigsqcup^N\ \bigotimes^N, rendering as TeX:
  \int_1 f\ \intop_1 f\ \iint_1 f\ \smalli
                  ^
  unexpected control sequence \intop
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \int_1^N\ \intop_1^N\ \iint_1^N\ \iiint_1^N\ \iiiint_1^N\
  \idotsint_1^N\ \oint_1^N\ \smallint_1^N\ \sum_1^N\ \prod_1^N\
  \coprod_1^N\ \bigwedge_1^N\ \bigvee_1^N\ \bigcap_1^N\ \bigcup_1^N
  \ \biguplus_1^N\ \bigsqcup_1^N\ \bigodot_1^N\ \bigoplus_1^N\
  \bigotimes_1^N, rendering as TeX:
  \int_1^N\ \intop_1^N\ \iint_1^N\ \iiint_
                  ^
  unexpected control sequence \intop
  expecting "%", "\\label", "\\tag", "\\nonumber" or whitespace
[WARNING] Could not convert TeX math \text{\c{c} \'e \`e \"e \^e \~n \r{u} \v{z} \textcircled{c}}, rendering as TeX:
  'e \`e \"e \^e \~n \r{u} \v{z} \textcirc
                     ^
  unexpected "\\"
  expecting text, "}", "{", "$", "$$", "\\(" or "\\["
jgm commented 1 year ago

Here's my pared down list:

\Arrowvert
\Bigl
\arrowvert
\bracevert
\cfrac
\circledS
\diagdown
\diagup
\gggtr
\idotsint
\injlim
\intop
\llless
\negmedspace
\negthickspace
\ngeqq
\nleqq
\nshortmid
\nshortparallel
\nsubseteqq
\nsupseteqq
\ointop
\projlim
\shortmid
\shortparallel
\smallint
\surd
\thickapprox
\thicksim
\underleftrightarrow
\varinjlim
\varliminf
\varlimsup
\varprojlim

Some of these are supported (e.g. \surd), so we need to look at the details. Others aren't in the symbol list at all.

infinity0 commented 1 year ago

I'm using the Debian pandoc which is a little bit behind this repo:

$ pandoc --version
pandoc 2.17.1.1
Compiled with pandoc-types 1.22.2, texmath 0.12.4, skylighting 0.12.3.1,
citeproc 0.6.0.1, ipynb 0.2
[..]
jgm commented 1 year ago

Clarification: texmath can handle \surd{3}{4} but not plain \surd.

infinity0 commented 1 year ago

Forwarding some extra information from Günter Milde, the docutils developer who also originally created unimathsymbols.txt:

The database and related work is available under https://milde.users.sourceforge.net/LUCR/Math/ The latest revision is used in latex2mathml but not published yet.

The "unimathsymbols" database only contains LaTeX math macros that map directly to Unicode code points. (\underleftrightarrow is implemented using ↔ (\leftrightarrow) in a <munder> element.)

jgm commented 1 year ago

We in fact generate our list of unicode - TeX mappings from Milde's 2011 database. If there's a new revision out, we could use that, but I couldn't find anything more recent than the 2011 one...