jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.03k stars 3.35k forks source link

Suggestion: When MathJax Is Used, the LaTeX Environments Included Should Be More Selective (and Put in Math Class) #2758

Closed ickc closed 8 years ago

ickc commented 8 years ago

Hi, since MathJax support some of the environments in LaTeX, how should the environment be used in pandoc? I didn't find it mentioned in the documentation.

One example is

\begin{align}
...
\end{align}

I found that currently if I put no delimiter around the above code, HTML and LaTeX output would be visually correct because MathJax can process the LaTeX environment like that (related to mathjax/MathJax-docs#146), and pandoc will pass it in both HTML and LaTeX output.

But looking into the HTML output code, it doesn't parse it with any math class nor put it in <p>.

For an MWE, I opened a repository to test these things: markdown-variants.

By the way, if we write it like

$$\begin{align}
...
\end{align}$$

The LaTeX output will be incorrect and cause errors.

So are there a recommended way to use such environments in pandoc so that both HTML and LaTeX output are correct and pandoc can recognize it and parse it as math rather than raw?

Thanks.

ickc commented 8 years ago

The original text here is moved to #2762, since although what said is closely related, the solution is completely different.

jgm commented 8 years ago

If you want it to be recognized as math (and not passed as raw), it needs to be between $..$ or $$..$$.

Try

$$
\begin{aligned}
...
\end{aligned}
$$
ickc commented 8 years ago

How about

\begin{equation}
...
\end{equation}

, in this case there's no equationed available?

If you click markdown-variants, you can see a bunch of such examples where an extra math delimiter will cause an error in LaTeX output.

jgm commented 8 years ago

You can just use $$..$$. You will not get an equation number, but that's because pandoc doesn't currently have support for equation numbering. A workaround is to use "example lists":

(@myeqn)  $$..$$

which you can refer to later as (@myeqn).

+++ ickc [Mar 06 16 22:45 ]:

How about

\begin{equation} ... \end{equation}

, in this case there's no equationed available?

If you click [1]mathjax-latex-md-mmd-pandoc/MWE-pandoc.md at gh-pages · ickc/mathjax-latex-md-mmd-pandoc, you can see a bunch of such examples where an extra math delimiter will cause an error in LaTeX output.

— Reply to this email directly or [2]view it on GitHub.

References

  1. https://github.com/ickc/mathjax-latex-md-mmd-pandoc/blob/gh-pages/MWE-pandoc.md
  2. https://github.com/jgm/pandoc/issues/2758#issuecomment-193126243
ickc commented 8 years ago

Put it in another way, could we make an exception to the follow cases

such that when --mathjax option is enabled, and the above pairs of delimiters appears, they will be treated as math (e.g. <span class="math display">, and possibly \newcommand and \renewcommand are handled depending on how we treat #2762), and otherwise leave as verbatim.

If this is done, LaTeX and HTML+MathJax compatibility are maximized (and possibly ePub and other HTML related formats? I heard people embedding MathJax in ePub but never tried and don't know the details).

ickc commented 8 years ago

Just to add some more info:

The Issue: How Latex Environments Should Be Used In Markdown

The whole issue is really related to how LaTeX environments should be used in markdown (see the original title).

MultiMarkdown treated this differently but not without issue: begin equation in $$ / Problems / Discussion Area - MultiMarkdown Support. What MultiMarkdown does is if $\begin...$, $$\begin...$$, \\(\begin...\\), \\[\begin...\\] are detected, it will be verbatim (except for adding classes) in HTML, but the outer delimiter in trimmed in LaTeX.

In pandoc, it is always left as verbatim (except for adding classes in HTML and when \newcommand and \renewcommand like are used).

References

Both approach has issues. It is discussed at length in

  1. ickc/markdown-variants.
  2. MathJax Documentation Is Ambiguous on How Environments Should Be Used · Issue #146 · mathjax/MathJax-docs
  3. begin equation in $$ / Problems / Discussion Area - MultiMarkdown Support
  4. Suggestion: Adding More Alternative Math Delimiters When --mathjax Option is Used, to Maximize Compatibility Between LaTeX and HTML+MathJax Outputs · Issue #2758 · jgm/pandoc (here)

The Goal: Maximizing LaTeX and HTML+MathJax Compatibility

What I said and suggested make sense if we are talking about LaTeX and HTML+MathJax outputs (from single Markdown source). But I understand that pandoc is about multiple output formats not restricted to these 2 in particular (and in fact pandoc have many ways of getting HTML+math). But given that

  1. MathJax is now the best and dominant effort to bring LaTeX math to HTML (and can be used in HTML like output (e.g. ePub although I have no idea if and how pandoc handle ePub+MathJax))
  2. the kind of exception made here for --mathjax option is not too complicated here (or is it? No idea how difficult it is to implement it or how much harder/slower the parser needed to work... sorry)
  3. it is only activated when --mathjax option is used, so it won't do any damage as far as backward compatibility is concerned. And in the case that the syntax has already been used in HTML+MathJax generation, it even improves the code semantically.
ickc commented 8 years ago

Hi, @jgm,

I am thinking about a different way to solve the problem I mentioned. According to my test I mentioned, the following LaTeX environments are supported by MathJax, and should not be put in Math delimiters ($...$ or $$...$$):

\begin{align}
...
\end{align}

\begin{align*}
...
\end{align*}

\begin{alignat}
...
\end{alignat}

\begin{alignat*}
...
\end{alignat*}

\begin{eqnarray}
...
\end{eqnarray}

\begin{eqnarray*}
...
\end{eqnarray*}

\begin{equation}
...
\end{equation}

\begin{equation*}
...
\end{equation*}

\begin{gather}
...
\end{gather}

\begin{gather*}
...
\end{gather*}

\begin{multline}
...
\end{multline}

\begin{multline*}
...
\end{multline*}

I understand from your previous comments that this is not really something pandoc supports. pandoc support math in many output formats, while the support I'm asking above is concerning HTML+MathJax and LaTeX output dual compatibility only.

I have 2 different solutions in my mind. I previously suggested that not only the dollar sign delimiters are regarded as math in pandoc, but anyone of the above pairs of \begin{...}...\end{...} (and only those) are also parsed as math in pandoc. It seems like you don't like that direction, and would make the pandoc syntax more complicated (and the parser more complicated and slow?).

The other possible solution is this: any of the above pairs requires an extra display math delimiter in pandoc, like this:

$$\begin{align}
...
\end{align}$$

$$\begin{align*}
...
\end{align*}$$

$$\begin{alignat}
...
\end{alignat}$$

$$\begin{alignat*}
...
\end{alignat*}$$

$$\begin{eqnarray}
...
\end{eqnarray}$$

$$\begin{eqnarray*}
...
\end{eqnarray*}$$

$$\begin{equation}
...
\end{equation}$$

$$\begin{equation*}
...
\end{equation*}$$

$$\begin{gather}
...
\end{gather}$$

$$\begin{gather*}
...
\end{gather*}$$

$$\begin{multline}
...
\end{multline}$$

$$\begin{multline*}
...
\end{multline*}$$

Then pandoc will regard them as math automatically. However LaTeX output will resulted in error (because of the extra math delimiter). And although MathJax will produce correct result from the HTML output with the extra math delimiter, it is a better practice to not having those extra delimiter.

The solution is then to create a filter to detect the above patterns, once matched, the extra $$ will be removed in LaTeX generation and in HTML generation with --mathjax option only. Since the other outputs probably do not support these environment anyway, the restriction on LaTeX/HTML+MathJax output might not be necessary (i.e. this replacement can be unconditional on the output format).

So if such filter is created, are you interested in including it in the official pandoc?

ickc commented 8 years ago

I just found out that pandoc LaTeX to Markdown conversion will turn align to aligned environment. I guess the others are similar too (equation?)

  1. Could these be put in the documentation?
  2. Could you consider giving a command line option to disable this?
jgm commented 8 years ago

+++ ickc [Apr 17 16 04:20 ]:

I just found out that pandoc LaTeX to Markdown conversion will turn align to aligned environment. I guess the others are similar too (equation?)

  1. Could these be put in the documentation?
  2. Could you consider giving a command line option to disable this?

The reason for this is that \begin{align} is not allowed inside math mode in LaTeX. That is,

$$
\begin{aligned}
...
\end{aligned}
$$

is equivalent to

\begin{align}
...
\end{align}
ickc commented 8 years ago

That is,

$$
\begin{aligned}
...
\end{aligned}
$$

is equivalent to

\begin{align}
...
\end{align}

No, it is not... The former one is not numbered. As I said, I know you commented earlier about pandoc do not support numbered equation. But MathJax and TeX both supported it and at least when a cross compatibility between the 2, TeX and HTML+MathJax, keeping align as align not aligned and similarly for all other cases makes more sense.

In other words, there's no current way to really mean \begin{align}...\end{align} and the like in pandoc. One could just typo so and the LaTeX output is correct, and in HTML generation the MathJax can also pick it up since it is smart enough to parse bare \begin...\end pairs. But then it will not be a correct syntax in pandoc's math and pandoc will not recognize it.

The solution I was suggesting is, for those environments, an extra pair of $$ is required. i.e. one typed $$\begin{align}...\end{align}$$, so that pandoc understand it is math. But, after pandoc parsed it, in HTML+MathJax and TeX output, it becomes \begin{align}...\end{align}, the correct syntax in LaTeX and MathJax. In fact, this is the current behavior of MultiMarkdown.

And since it deviates from the current behavior (where pandoc do some sophisticated stuff to turn one environment to another closest one, e.g. align to aligned, or eqnarray to aligned, etc.), I suggest a command line option (or filter) to turn these features on and off.

ickc commented 8 years ago

Another test: TeX to Markdown by pandoc:

\begin{align}
...
\end{align}

\begin{align*}
...
\end{align*}

\begin{alignat}{2}
...
\end{alignat}

\begin{alignat*}{2}
...
\end{alignat*}

\[
\begin{aligned}
...
\end{aligned}
\]

\[
\begin{alignedat}{2}
...
\end{alignedat}
\]

\[
\begin{array}{lcl}
...
\end{array}
\]

\[
\begin{Bmatrix}
...
\end{Bmatrix}
\]

\[
\begin{bmatrix}
...
\end{bmatrix}
\]

\[
\begin{cases}
...
\end{cases}
\]

\[
\begin{CD}
...
\end{CD}
\]

\begin{eqnarray}
...
\end{eqnarray}

\begin{eqnarray*}
...
\end{eqnarray*}

\begin{equation}
...
\end{equation}

\begin{equation*}
...
\end{equation*}

\begin{gather}
...
\end{gather}

\begin{gather*}
...
\end{gather*}

\[
\begin{gathered}
...
\end{gathered}
\]

\[
\begin{matrix}
...
\end{matrix}
\]

\begin{multline}
...
\end{multline}

\begin{multline*}
...
\end{multline*}

\[
\begin{pmatrix}
...
\end{pmatrix}
\]

\[
\begin{smallmatrix}
...
\end{smallmatrix}
\]

\[
\begin{split}
...
\end{split}
\]

\[
\begin{subarray}
...
\end{subarray}
\]

\[
\begin{Vmatrix}
...
\end{Vmatrix}
\]

\[
\begin{vmatrix}
...
\end{vmatrix}
\]

becomes

$$\begin{aligned}
...\end{aligned}$$

$$\begin{aligned}
...\end{aligned}$$

$$\begin{aligned}
{2}
...\end{aligned}$$

$$\begin{aligned}
{2}
...\end{aligned}$$

$$\begin{aligned}
...
\end{aligned}$$

$$\begin{alignedat}{2}
...
\end{alignedat}$$

$$\begin{array}{lcl}
...
\end{array}$$

$$\begin{Bmatrix}
...
\end{Bmatrix}$$

$$\begin{bmatrix}
...
\end{bmatrix}$$

$$\begin{cases}
...
\end{cases}$$

$$\begin{CD}
...
\end{CD}$$

$$\begin{aligned}
...\end{aligned}$$

$$\begin{aligned}
...\end{aligned}$$

$$...$$

$$...$$

$$\begin{gathered}
...\end{gathered}$$

$$\begin{gathered}
...\end{gathered}$$

$$\begin{gathered}
...
\end{gathered}$$

$$\begin{matrix}
...
\end{matrix}$$

$$\begin{gathered}
...\end{gathered}$$

$$\begin{gathered}
...\end{gathered}$$

$$\begin{pmatrix}
...
\end{pmatrix}$$

$$\begin{smallmatrix}
...
\end{smallmatrix}$$

$$\begin{split}
...
\end{split}$$

$$\begin{subarray}
...
\end{subarray}$$

$$\begin{Vmatrix}
...
\end{Vmatrix}$$

$$\begin{vmatrix}
...
\end{vmatrix}$$

In short, aligned related including align, alignat, alignedat, eqnarray and their stared variants are converted into aligned; equation related into $$; gathered related including multiline into gathered.

So clearly pandoc is very sophisticated and can translate some TeX environments to something pandoc prefers.

The other way of looking at my feature request is the opposite effects of those translations above e.g. from Markdown's $$\begin{equation}...\end{equation}$$ and $$\begin{align}...\end{align}$$ translated into LaTeX as \begin{equation}...\end{equation} and \begin{align}...\end{align}, etc.

I understand that $$\begin{equation}...\end{equation}$$, $$\begin{align}...\end{align}$$, etc. are not the one pandoc prefers, but there's a legitimate reason for one to use them in the Markdown source: for HTML+MathJax & LaTeX outputs. So in case a user did use it in the Markdown source, pandoc can use a similarly sophisticated approach it for LaTeX->markdown convertion, but "run it backward" (its not exactly like that but it's an illustration).

yihui commented 8 years ago

I guess I just ran into this issue as well. I wrote an equation in Markdown:

\begin{equation}
  f\left(k\right) = \binom{n}{k} p^k\left(1-p\right)^{n-k}
  \label{eq:binom}
\end{equation}

This can be rendered correctly for both PDF and HTML output, but it disappears in EPUB(3). It will be nice if the equation environment can be preserved for EPUB output.

ickc commented 8 years ago

Hi, @yihui

I assume you use MathJax in HTML? How about ePub? Which option did you use? I heard people embedded MathJax in ePub. With mathjax/MathJax-grunt-cleaner: A grunt file to reduce the footprint of a MathJax installation it can be much smaller than the original(mine is 1.7MB).

I created this script to temporary fix the problem to work with these extra environments that pandoc doesn't support: ickc/pandoc-mathjax-extended: Using the LaTeX environments supported by MathJax in pandoc. And I described the problem in Testing LaTeX Environments Usage in MathJax From Markdown Convertion.

What I did with that script is basically I assume the \begin{equation}... are surrounded by $$ like this:

$$\begin{equation}
  f\left(k\right) = \binom{n}{k} p^k\left(1-p\right)^{n-k}
  \label{eq:binom}
\end{equation}$$

and the script do very simple regex substitution to transform it into

\begin{equation}
  f\left(k\right) = \binom{n}{k} p^k\left(1-p\right)^{n-k}
  \label{eq:binom}
\end{equation}

Since MathJax accept either of these (with extra $$ or not), in output using MathJax, this script need not be used. And with the extra $$ pandoc recognize it as math.

Only in LaTeX output this script are applied before as a preprocessor and output the result in stdout to be used in conjunction with pandoc. pandoc will then recognize it as raw LaTeX.

If you do use this script watch out for how I did it: I created a new file from the original source by appending a -temp to it (e.g. test.md becomes test-temp.md). If you do have a test-temp.md file it will be overwritten. I also move this file to the ~/.Trash/ on Mac. I probably will clean them up later. (I am migrating to pandoc and is trying to create a few pandoc tools and scripts for my personal use but right now they are scattered around in different repos.)

yihui commented 8 years ago

@ickc LaTeX math written in $ $ or $$ $$ work well in EPUB (with MathJax). My problem was equation and other math environments are discarded by Pandoc when converting them to EPUB. Protecting these environments in $$ is an incomplete solution since 1) you only have to do this when the output is EPUB (no need to hack for PDF or HTML output), and 2) you cannot substitute \begin{equation} with $$\begin{equation} unconditionally, e.g. you may want to show it literally

\begin{equation}

I think the easiest solution is that Pandoc does not simply throw away these math environments when rendering EPUB. Thanks for your idea anyway!

ickc commented 8 years ago

hi, @yihui

Yes, it is the same problem as I original described. Basically related to pandoc only treat things in math delimiter as math. But environments like equation/align are not supposed to used the extra math delimiter. But if it is not used, pandoc treat it as raw LaTeX. This is the current problem.

The potential solutions are:

  1. Ideally, since pandoc is already parsing those begin/end pair and regard them as raw LaTeX, it can be improved to detect if the environment is equation/align/... (the list of those enovironments that is supported by MathJax are mentioned above). In that case it would not be disregarded in HTML and related output (ePub), i.e. they are treated as both raw LaTeX and raw HTML (since MathJax understand them fine in raw HTML, so to MathJax they are raw HTML).
  2. Less ideally, but will simplify things a bit in terms of syntax: all equations environment will require an extra pair of $$. pandoc will then detect $$\begin{.... and only in those cases mentioned above (but not aligned for example) it will remove the extra $$.

My temporary solution is the 2nd one since it ganrantee pandoc will regard it as math currently. I am hoping an official solution will appear, and hopefully take the first approach.

By the way, probably you are not supposed to have them in the HTML output but it is a bug that let the raw LaTeX slip through the HTML output even if -R is not used. see #2860

ickc commented 8 years ago

@jgm said in #2860:

With --mathjax we include raw LaTeX blocks.

which means it is already doing part of what I was suggesting.

And it also suggests that the correct way of putting those environments in pandoc syntax is raw LaTeX (i.e. \begin{align}...\end{align} but not $$\begin{align}...\end{align}$$). Some might think it is trivial but the correct syntax to use these environments does depends on implementation, e.g. MultiMarkdown's is exactly opposite to pandoc. See ickc/markdown-variants.

So the issues are then:

Not Mentioned in the User Guide

Quoting from the pandoc user guide:

If the --mathjax option is used, TeX math will be displayed between \(...\) (for inline math) or \[...\] (for display math) and put in <span> tags with class math. The MathJax script will be used to render it as formulas.

Not Putting Those LaTeX Environments in Math Class

Related to the quote in the user guide above, those LaTeX environments are not being

put in <span> tags with class math

Including Too Much LaTeX Environments

Quoting from MathJax documentation:

LaTeX environments of the form \begin{XXX} ... \end{XXX} are provided where XXX is one of the following:

align [AMSmath]
align* [AMSmath]
alignat [AMSmath]
alignat* [AMSmath]
aligned [AMSmath]
alignedat [AMSmath]
array
Bmatrix
bmatrix
cases
CD AMSmath
eqnarray
eqnarray*
equation
equation*
gather [AMSmath]
gather* [AMSmath]
gathered [AMSmath]
matrix
multline [AMSmath]
multline* [AMSmath]
pmatrix
smallmatrix AMSmath
split [AMSmath]
subarray AMSmath
Vmatrix
vmatrix

By including all raw LaTeX in HTML output with MathJax option means other environments like

\begin{flushright}
text flushed to the right
\end{flushright}

will also be included in HTML output with MathJax option. See #2860.

ickc commented 8 years ago

@yihui

I think the easiest solution is that Pandoc does not simply throw away these math environments when rendering EPUB.

@jgm commented in #2860:

I'm not sure what's going on there. I just tried and the 'equation' environment did get transferred over to epub with --mathjax. So I can't reproduce what he's describing.

Can you provide a more detailed MWE and commands used?

yihui commented 8 years ago

Sure. Here is a MWE:

test.md

$\alpha + \beta$

\begin{equation}
  f\left(k\right) = \binom{n}{k} p^k\left(1-p\right)^{n-k}
  \label{eq:binom}
\end{equation}

I used this command:

pandoc -f markdown -t epub3 --mathjax -o test.epub test.md

Output: test.zip

When I open test.epub, only the math expression in $ $ is displayed, and the equation is gone.

$ pandoc --version
pandoc 1.17.0.2
Compiled with texmath 0.8.5, highlighting-kate 0.6.2.
Syntax highlighting is supported for the following languages:
    abc, actionscript, ada, agda, apache, asn1, asp, awk, bash, bibtex, boo, c,
    changelog, clojure, cmake, coffee, coldfusion, commonlisp, cpp, cs, css,
    curry, d, diff, djangotemplate, dockerfile, dot, doxygen, doxygenlua, dtd,
    eiffel, elixir, email, erlang, fasm, fortran, fsharp, gcc, glsl,
    gnuassembler, go, hamlet, haskell, haxe, html, idris, ini, isocpp, java,
    javadoc, javascript, json, jsp, julia, kotlin, latex, lex, lilypond,
    literatecurry, literatehaskell, llvm, lua, m4, makefile, mandoc, markdown,
    mathematica, matlab, maxima, mediawiki, metafont, mips, modelines, modula2,
    modula3, monobasic, nasm, noweb, objectivec, objectivecpp, ocaml, octave,
    opencl, pascal, perl, php, pike, postscript, prolog, pure, python, r,
    relaxng, relaxngcompact, rest, rhtml, roff, ruby, rust, scala, scheme, sci,
    sed, sgml, sql, sqlmysql, sqlpostgresql, tcl, tcsh, texinfo, verilog, vhdl,
    xml, xorg, xslt, xul, yacc, yaml, zsh
Default user data directory: /Users/yihui/.pandoc
Copyright (C) 2006-2016 John MacFarlane
Web:  http://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.
jgm commented 8 years ago

With epub3, pandoc uses mathml for math (even if you specify --mathjax). That's because mathml is part of the epub3 spec.

That explains the difference between -t epub3 and -t epub with your example.

I'm open to suggestions as to how this should be handled.

If we parsed an "equation" environment as Math, then we'd get an error in LaTeX/PDF output, because we'd have

\[
\begin{equation}
..
\end{equation}
\]

which isn't legal. We could strip off the \begin{equation} and \end{equation} and just include the contents of the environment as a Math element. Then it would work in both HTML/EPUB and LaTeX/PDF, but the equation would not be numbered.

Neither seems ideal.

+++ Yihui Xie [May 08 16 23:34 ]:

Sure. Here is a MWE:

test.md

$\alpha + \beta$

\begin{equation} f\left(k\right) = \binom{n}{k} p^k\left(1-p\right)^{n-k} \label{eq:binom} \end{equation}

I used this command:

pandoc -f markdown -t epub3 --mathjax -o test.epub test.md

Output: [1]test.zip

When I open test.epub, only the math expression in $ $ is displayed, and the equation is gone. $ pandoc --version pandoc 1.17.0.2 Compiled with texmath 0.8.5, highlighting-kate 0.6.2. Syntax highlighting is supported for the following languages: abc, actionscript, ada, agda, apache, asn1, asp, awk, bash, bibtex, boo, c, changelog, clojure, cmake, coffee, coldfusion, commonlisp, cpp, cs, css, curry, d, diff, djangotemplate, dockerfile, dot, doxygen, doxygenlua, dtd, eiffel, elixir, email, erlang, fasm, fortran, fsharp, gcc, glsl, gnuassembler, go, hamlet, haskell, haxe, html, idris, ini, isocpp, java, javadoc, javascript, json, jsp, julia, kotlin, latex, lex, lilypond, literatecurry, literatehaskell, llvm, lua, m4, makefile, mandoc, markdown, mathematica, matlab, maxima, mediawiki, metafont, mips, modelines, modula2, modula3, monobasic, nasm, noweb, objectivec, objectivecpp, ocaml, octave, opencl, pascal, perl, php, pike, postscript, prolog, pure, python, r, relaxng, relaxngcompact, rest, rhtml, roff, ruby, rust, scala, scheme, sci, sed, sgml, sql, sqlmysql, sqlpostgresql, tcl, tcsh, texinfo, verilog, vhdl, xml, xorg, xslt, xul, yacc, yaml, zsh Default user data directory: /Users/yihui/.pandoc Copyright (C) 2006-2016 John MacFarlane Web: http://pandoc.org This is free software; see the source for copying conditions. There is no warranty, not even for merchantability or fitness for a particular purpose.

— You are receiving this because you were mentioned. Reply to this email directly or [2]view it on GitHub

References

  1. https://github.com/jgm/pandoc/files/254399/test.zip
  2. https://github.com/jgm/pandoc/issues/2758#issuecomment-217786258
yihui commented 8 years ago

My initial request was to not drop the equation environments all together when writing the EPUB output, i.e. write them verbatim in the output just like what Pandoc does for HTML and LaTeX. It is up to the EPUB reader whether the equations can be rendered, e.g. iBooks cannot render equations, but Calibre can (using MathJax). I think it is better to have equations in EPUB that cannot be rendered (readers can still see the LaTeX source) than just dropping them silently.

jgm commented 8 years ago

+++ Yihui Xie [May 09 16 13:32 ]:

My initial request was to not drop the equation environments all together when writing the EPUB output, i.e. write them verbatim in the output just like what Pandoc does for HTML and LaTeX. It is up to the EPUB reader whether the equations can be rendered, e.g. iBooks cannot render equations, but Calibre can (using MathJax). I think it is better to have equations in EPUB that cannot be rendered (readers can still see the LaTeX source) than just dropping them silently.

Yes, I understand the request. This is exactly what pandoc does with epub (v2) output. With epub3 output, we use MathML for math (and iBooks does render this if I recall correctly). But to make this work, you need to make sure that pandoc parses your math as math, not as raw HTML. So, use $$..$$ instead of begin/end{equation}.

yihui commented 8 years ago

Oh okay, I see. I don't have a suggestion on what to do then. Thanks!

ickc commented 8 years ago

Hi, @jgm,

Could you summarize the math pandoc "natively" support? (e.g. it seems like pandoc not only support $...$, $$...$$ but also $$\begin{aligned}...\end{aligned}$$. The LaTeX reader turns eqnarray into alligned, etc. And another issue mentioned aligned in docx output.) Or are there a place I can find such info (except for source code)?

Thanks.

Edit: from pandoc/src/Text/Pandoc/Readers/LaTeX.hs line 444-460 and 1115-1127 it seems there's 2 native environments, gathered and aligned.

ickc commented 8 years ago

@yihui

Sorry, a sidetrack:

iBooks cannot render equations, but Calibre can (using MathJax).

I thought iBooks can use MathJax too, can't it? (I personally don't use ePub but see it in How to include MathJax in an epub3 file to work with iBooks (and possibly others) · Peter Krautzberger, quite an old post though.)

ickc commented 8 years ago

@jgm said,

With epub3, pandoc uses mathml for math (even if you specify --mathjax). That's because mathml is part of the epub3 spec.

I'm brainstorming an alternative approach to handle this case: could we use mathjax/MathJax-node: Mathjax for Node to convert the math to CommonHTML/SVG if --mathjax is used in the ePub3 option?

The reasons are:

So in the case --mathjax option (or possibly a new command line option to distinguish it with the existing function of --mathjax, perhaps --mathjaxnode) is used, this more sophisticated approach is used instead of MathML output.

jgm commented 8 years ago

If the mathjax-node rendered HTML really does work on all epub3 readers (do they all support SVG?), then it sounds like a good idea to provide a way to use this.

Perhaps it would make sense to add a generic math option that simply pipes each bit of math through mathjax-node and inserts the output as raw HTML. (This could be implemented first as a filter, and then, if desired, made a standard part of pandoc.)

This could provide a good path for portable math in EPUB.

+++ ickc [May 10 16 21:39 ]:

[1]@jgm said,

With epub3, pandoc uses mathml for math (even if you specify
--mathjax). That's because mathml is part of the epub3 spec.

I'm brainstorming an alternative approach to handle this case: could we use [2]mathjax/MathJax-node: Mathjax for Node to convert the math to CommonHTML/SVG if --mathjax is used in the ePub3 option?

The reasons are: * MathML support is incomplete * MathJax provides more support of LaTeX syntax (e.g. the equation/align with numbering) * MathJax Node rendered results are "static" (does not require JS), and (probably) is plain HTML that should be supported by every ePub reader (not certain on this).

So in the case --mathjax option (or possibly a new command line option to distinguish it with the existing function of --mathjax, perhaps --mathjaxnode) is used, this more sophisticated approach is used instead of MathML output.

— You are receiving this because you were mentioned. Reply to this email directly or [3]view it on GitHub

References

  1. https://github.com/jgm
  2. https://github.com/mathjax/MathJax-node
  3. https://github.com/jgm/pandoc/issues/2758#issuecomment-218360007
ickc commented 8 years ago

@jgm said,

If the mathjax-node rendered HTML really does work on all epub3 readers (do they all support SVG?), then it sounds like a good idea to provide a way to use this.

Perhaps it would make sense to add a generic math option that simply pipes each bit of math through mathjax-node and inserts the output as raw HTML. (This could be implemented first as a filter, and then, if desired, made a standard part of pandoc.)

This could provide a good path for portable math in EPUB.

I ran some tests of MathJax-node in ickc/MathJax-node-test: Testing the output of MathJax Node. I think the contenders are CommonHTML and SVG output (since the status of MML across different browsers is bad). Here's a few points I noticed:

CommonHTML vs SVG Output

Looks like the MathJax-node SVG output is a winner. Probably the only remaining question is about how well ePub covers it. But most definitely it will be very useful in other output format (HTML). Especially it can have equation numbering and referencing. Probably the MathJax-node MML output can be used instead of the current MML output algorithm too (or as an option).

File Size

File size of all 4 test files in KB:

original page2mml page2svg page2html
17 45 232 359

To put it into perspective, loading the MathJax.js would takes about 400KB.

Time

Another point worth testing perhaps is the rendering time. I didn't time it but the impression is: MML output took the longest time, CommonHTML and SVG comparable while SVG seems faster. I didn't test the time pandoc generates other output like PNG or MML though.

Edit: I should have mentioned this: the MathJax-node rendered output will be rendered by browser almost instantly (and from experience we all know how the usual MathJax rendering page took a hit in render times and is typically in the order of seconds).

Edit2: I forgot to mention an obvious fact: it requires node.js installation. Hope that won't matters too much.

ickc commented 8 years ago

Thanks!