jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.14k stars 3.35k forks source link

Export MathML in ODT as LaTeX #5602

Open memeplex opened 5 years ago

memeplex commented 5 years ago

I have created the attached and zipped odt by exporting it from google docs. The text is just "Hi x^2" where x^2 is an squared x as an equation. Converting it to markdown using pandoc gives:

pandoc -t markdown Prueba.odt 
Hi ![](./ObjectReplacements/Object 2){width="0.1417in"

Inside the odt Object 2/content.xml is:

cat Object\ 2/content.xml 
<?xml version="1.0" encoding="UTF-8"?>
<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><msup><mi>x</mi><mn>2</mn></msup><annotation encoding="StarMath 5.0">{x} ^ {2}</annotation></semantics></math>

that is, the equation was exported as MathML, not as an image. Now, I would have expected this to be exported as LaTeX in the markdown output.

Am I missing something?

Pandoc version: 2.2.1

Prueba.zip

memeplex commented 5 years ago

BTW, this is the output when exporting as docx:

pandoc -t markdown Prueba.docx 
Hi $x^{2}$.

Much better!

jgm commented 5 years ago

I don't think the Odt reader currently parses math elements. This would be a very useful addition, and it shouldn't be hard -- we already convert mathml to tex in reading docbook, for example.

Note, however, that we'd only be able to parse presentation mathml, not semantic mathml (you use the 'semantics' tag here).

memeplex commented 5 years ago

I barely understand mathml, let alone what semantic/presentation tags are. I can say that it works fine with docx though, so if you're doing the "presentation" stuff there I assume it should work here too.

memeplex commented 5 years ago

Maybe I should have mentioned that equations are represented as mathml in docx too. I have verified this. I don't know about the semantic/presentation thing though.

mb21 commented 5 years ago

Might work with #5606...

jgm commented 5 years ago

@memeplex No, docx doesn't use mathml. It uses a different, XML-based math format. texmath and pandoc can convert between these.

memeplex commented 5 years ago

Thanks!

jgm commented 5 years ago

Sorry, it was claimed in the PR that it fixed this issue, but it doesn't. @bImage was your PR supposed to handle this "object replacement" stuff? The other issue is math parsing; I don't believe we currently parse mathml in the odt reader, though we could.