Closed GongYiLiao closed 9 years ago
I'm a bit confused as to what you're asking for. Are you asking for \mathcal{H}
to be rendered to the unicode character ℋ? I'm unsure this will produce better results that mathjax's native rendering but I don't have much practical experience with mathjax.
I think it's possible to use MathJax with options to render
mathml in the document. So, you can use --mathml
and
inculde a link to the mathjax script with appropriate
options.
I don't want to mess with the latex we're passing to MathJax's latex math renderer.
From my experience, mathjax doesn't make a distinction between unicode characters and tex ones, typesetting both with its math fonts anyway. So I suppose this request is more about source readability than anything else. That said, I don't think it's a good idea.
2015-05-07 3:24 GMT+03:00 John MacFarlane notifications@github.com:
I think it's possible to use MathJax with options to render mathml in the document. So, you can use
--mathml
and inculde a link to the mathjax script with appropriate options.I don't want to mess with the latex we're passing to MathJax's latex math renderer.
— Reply to this email directly or view it on GitHub https://github.com/jgm/pandoc/issues/2137#issuecomment-99656154.
In the case that the output format is docx
, Microsoft Word definitely won't process those LaTeX
commands well, thus, converting those LaTeX
commands into Unicode symbols will be a better solution.
Actually, in Text.TeXMath.Writers.Pandoc
, the function renderStrs
does this kind of work in some cases like TextBoldFrak
:
renderStr :: TextType -> String -> Inline
renderStr tt s =
case tt of
TextNormal -> Str s
TextBold -> Strong [Str s]
TextItalic -> Emph [Str s]
TextMonospace -> Code nullAttr s
TextSansSerif -> Str s
TextDoubleStruck -> Str $ toUnicode tt s
TextScript -> Str $ toUnicode tt s
TextFraktur -> Str $ toUnicode tt s
TextBoldItalic -> Strong [Emph [Str s]]
TextSansSerifBold -> Strong [Str s]
TextBoldScript -> Strong [Str $ toUnicode tt s]
TextBoldFraktur -> Strong [Str $ toUnicode tt s]
TextSansSerifItalic -> Emph [Str s]
TextSansSerifBoldItalic -> Strong [Emph [Str s]]
A customized filter may work for this, but I think a solution that converts all LaTeX
symbol commands (not operator commands like \int
) into Unicode symbols can be useful.
TeXMath has separate output format for docx. --mathjax
and other math
options have no effect when converting to docx. Math options are only
relevant for HTML-based output, namely Docbook, EPUB, FB2 and HTML.
Finally, mathjax only makes sense in HTML output, since it needs javascript
to work. I don't think I understand what are you asking for and why do you
need it.
If you want us to fix a particular problem, please describe this problem and provide a minimal example showing it. If you lack particular functionality, please describe your use-case. Forgive me if this sounds rude, but at the moment it seems like you are tilting at windmills. No offence.
2015-05-07 16:21 GMT+03:00 Gong-Yi Liao notifications@github.com:
In the case that the output format is docx, Microsoft Word definitely won't process those LaTeX commands well, thus, converting those LaTeX commands into Unicode symbols will be a better solution. Actually, in Text.TeXMath.Writers.Pandoc, the function renderStrs does this kind of work in some cases like TextBoldFrak:
renderStr :: TextType -> String -> Inline renderStr tt s = case tt of TextNormal -> Str s TextBold -> Strong [Str s] TextItalic -> Emph [Str s] TextMonospace -> Code nullAttr s TextSansSerif -> Str s TextDoubleStruck -> Str $ toUnicode tt s TextScript -> Str $ toUnicode tt s TextFraktur -> Str $ toUnicode tt s TextBoldItalic -> Strong [Emph [Str s]] TextSansSerifBold -> Strong [Str s] TextBoldScript -> Strong [Str $ toUnicode tt s] TextBoldFraktur -> Strong [Str $ toUnicode tt s] TextSansSerifItalic -> Emph [Str s] TextSansSerifBoldItalic -> Strong [Emph [Str s]]
I am wondering if we can have a solution that converting all LaTeX symbol commands (not operator commands like \int) into Unicode symbols.
A customized filter may work for this, but I think a solution that converts all LaTeX symbol commands into Unicode symbols can be useful.
— Reply to this email directly or view it on GitHub https://github.com/jgm/pandoc/issues/2137#issuecomment-99861804.
Here's the markdown clip for testing:
# Math Tests
## Test 1
### Without any ```LaTeX``` commands converted to Unicode symbols
$$ e = \int_\mathbb{R} f(x | \theta) \circ g(\mathbfit{z} |\mathbfit{\eta}) dx $$
## With ```\mathbfit{*}``` commands converted to Unicode symbols before fed to ```Pandoc```
$$ e = \int_\mathbb{R} f(x | \theta) \circ g(𝒛|𝜼) dx $$
## Test 2
### Without any ```LaTeX``` commands converted to Unicode symbols
$$ \mathsf{e} = \mathbffrak{z} $$
## With ```\mathbffrak{z}``` command converted to Unicode symbols before fed to ```Pandoc```
$$ \mathsf{e} = 𝖟 $$
Below is the screenshot of the output generated with -s -S --mathjax -t html5
option (whose LaTeX
code will not be processed by Pandoc
and TeXMath
) :
In Test 1's case 1 (without any conversion), it shows that, MathJax
does not recognize \mathbfit
command thus it shows warnings in red and displays "z" and "η" incorrectly (they should be in boldface). In both tests' case two (convert to Unicode before fed to Pandoc
), there is no problem at all.
Another screenshot is the output generated with -s -S -t docx
option (whose LaTeX
code are partially processed by Pandoc
and TeXMath
):
In Test 1's case 1, the \mathbfit
command is just ignored (or not function) where "z" and "η are not in boldface. In Test 2, both cases are displayed correctly, as \mathbffrak
are well treated by TeXMath
's renderStr
function.
From above two output examples, we can find that there's an inconsistency in displaying mathematical letter symbols (not operator symbols) due to different output targets, even they have exactly the same markdown source code.
Thus, it seems helpful to have an unified solution to process those LaTeX
commands who have corresponding standardized Unicode letter symbols. For me, a customized filter should work, but I think an unified solution in Pandoc
may benefit all the users.
@GongYiLiao Pardon me if I didn't understand the issue properly; I was going to try what @jgm suggested (and http://docs.mathjax.org/en/latest/mathml.html#mathjax-mathml-support) so I started with your example as 2137.txt:
pandoc 2137.txt -s -S --mathjax=http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML -t html5 --mathml -o 2137.html
this is what I get:
I'm missing something?
Thank you. Now it's much more clear.
From what I gather, these are two separate problems.
First one is mathjax doesn't support \mathbfit
, \mathbffrak
and
possibly other commands, which are supported by TeXMath, at least
partially. I don't think this is TeXMath's problem, strictly speaking,
since it's MathJax that lacks support for those, but we certainly could
implement some sort of workaround for this, or at least document this
somewhere.
Second one is \mathbfit
doesn't work as expected in docx output. This
looks like a genuine bug in TeXMath.
Your proposed solution to both these problems is to preprocess such macros into corresponding unicode symbols. I don't think this is the best possible solution, but it's certainly on the easier side. Personally, I would prefer to address these separately, if possible.
2015-05-07 17:54 GMT+03:00 Gong-Yi Liao notifications@github.com:
Here's the markdown clip for testing:
Math Tests
Test 1
Without any
LaTeX
commands converted to Unicode symbols$$ e = \int_\mathbb{R} f(x | \theta) \circ g(\mathbfit{z} |\mathbfit{\eta}) dx $$
With
\mathbfit{*}
commands converted to Unicode symbols before fed toPandoc
$$ e = \int_\mathbb{R} f(x | \theta) \circ g(𝒛|𝜼) dx $$
Test 2
Without any
LaTeX
commands converted to Unicode symbols$$ \mathsf{e} = \mathbffrak{z} $$
With
\mathbffrak{z}
command converted to Unicode symbols before fed toPandoc
$$ \mathsf{e} = 𝖟 $$
Below is the screenshot of the output generated with -s -S --mathjax -t html5 option (whose LaTeX code will not be processed by Pandoc and TeXMath) : [image: pandoc_mathjax_screenshot] https://cloud.githubusercontent.com/assets/129343/7517553/531cda94-f49b-11e4-8f6e-fce41841d72f.png In Test 1's case 1 (without any conversion), it shows that, MathJax does not recognize \mathbfit command thus it shows warnings in red and display "z" and "η" incorrectly (they should be in boldface). Both tests' case two (convert to Unicode before fed to Pandoc) have no problem at all.
Another screenshot is the output generated with -s -S -t docx option (whose LaTeX code are partially processed by Pandoc and TeXMath): [image: pandoc_docx_screenshot] https://cloud.githubusercontent.com/assets/129343/7517750/7902a1a2-f49c-11e4-87e7-c7510d227fb8.png In Test 1's case 1, the \mathbfit command is just ignored (or not function) where "z" and "η are not in boldface. In Test 2, both cases are displayed correctly, as \mathbffrak are well treated by TeXMath's renderStr function.
From above two output examples, we can find there's an inconsistency in displaying mathematical formulas due to different output format, even they have exactly the same markdown source code.
Thus, it seems helpful to have an unified solution to process those LaTeX commands who have corresponding standardized Unicode symbols. For me , a customized filter should work, but I think an unified solution in Pandoc may benefit all the Pandoc's users.
— Reply to this email directly or view it on GitHub https://github.com/jgm/pandoc/issues/2137#issuecomment-99897253.
@nkalvi if you use the --mathjax
option, pandoc will include tex math. What I was suggesting is using the --mathml
option, and manually including the relevant mathjax link in your HTML header. (Or pass in the entire HTML link element using -V math=...
.)
@jgm I think that's what I did, and the result seems to be what @GongYiLiao expected (without specifying additional configurations). Could you please check the command line I had?
I think this issue should be migrated over to jgm/texmath. @GongYiLiao - can you open an issue there with your test case and examples?
The problem with \mathbfit
in OMML is clear enough: OMML writer, line 99:
TextBoldItalic -> [sty "i"]
I can't recall the details of OMML right now, but either this is just a typo on my part, or there's some kind of limitation that prevents you from having bold italics.
@nkalvi you have both --mathjax
and --mathml
. These are not to be used together (pandoc should probably issue a warning or error if you try). Remove --mathml
from the command line and and include an appropriate link element in your header.
Thanks @jgm, since I didn't get any errors/warnings and the output looked 'acceptable' I thought it was allowed.
Moved to jgm/texmath#76
That following style in
TexMath
'sstyleOps
:are applied only if
Pandoc
's--mathml
output option is used, when--mathjax
option is specified, above`styleOps
are not applied.Since the conversion between
LaTeX
commands and Unicode mathematical symbols is already supported inTexMath
, it seems a plus toPandoc
if it can convertLaTeX
commands to Unicode symbols while--mathjax
or--katex
options are specified for`html5
output.