tomduck / pandoc-eqnos

A pandoc filter for numbering equations and equation references.
GNU General Public License v3.0
221 stars 27 forks source link

Badly formated XML for docx output #60

Open SG-phimeca opened 3 years ago

SG-phimeca commented 3 years ago

Eqnos produces badly formated XML when used for docx output. I compile file demo.md containing only

$$ y = mx + b $$ {#eq:line}

or alternatively

$$ y = mx + b $${#eq:line}

with the following command

pandoc demo.md -o demo.docx --filter pandoc-eqnos

The output file cannot be open with Microsoft Windows (in a windows VirtualBox) nor LibreOffice.

I am running Ubuntu 16.04.

No error is reported on compilation. The -v flags outputs

pandoc 2.11.3.2
Compiled with pandoc-types 1.22, texmath 0.12.1, skylighting 0.10.2,
citeproc 0.3.0.3, ipynb 0.1.0.1

I use version 2.5.0 of pandoc eqnos, installed with anaconda.

nialov commented 3 years ago

Same issue, the {#eq:eq_label} labeling will stop Word from opening the compiled docx.

➜ pandoc -v
pandoc 2.11.4
Compiled with pandoc-types 1.22, texmath 0.12.1, skylighting 0.10.2,
citeproc 0.3.0.5, ipynb 0.1.0.1
➜ pandoc-eqnos --version
pandoc-eqnos 2.5.0
pfeffer90 commented 3 years ago

Hi, I have the same issue. The error by libreoffice (v7.0) is

image

Previous googling got me to #16, so it might be a related problem. Indeed, when I looked at the generated document.xml file, there seems to be issues with matching of the <w:p> tags in the <w:bookmarkStart w:id="0" w:name="eq:eq1" />. When I removed the two problematic tags, firefox correctly display the document.xml file, but rezipping into a docx still lead to a corrupted files.

Here is the document.xml

<?xml version="1.0" encoding="UTF-8"?><w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"><w:body><w:p><w:pPr><w:pStyle w:val="FirstParagraph" /></w:pPr><w:r><w:t xml:space="preserve">An equation</w:t></w:r><w:r><w:t xml:space="preserve"> </w:t></w:r><w:bookmarkStart w:id="0" w:name="eq:eq1"/><w:r><w:t></w:p><w:p><w:pPr><w:pStyle w:val="BodyText" /></w:pPr><m:oMathPara><m:oMathParaPr><m:jc m:val="center" /></m:oMathParaPr><m:oMath><m:r><m:t>x</m:t></m:r><m:r><m:t>  </m:t></m:r><m:d><m:dPr><m:begChr m:val="(" /><m:endChr m:val=")" /><m:grow /></m:dPr><m:e><m:r><m:t>1</m:t></m:r></m:e></m:d></m:oMath></m:oMathPara></w:p><w:p><w:pPr><w:pStyle w:val="FirstParagraph" /></w:pPr></w:t></w:r><w:bookmarkEnd w:id="0"/></w:p><w:sectPr /></w:body></w:document>
pandoc -v  

pandoc 2.13
Compiled with pandoc-types 1.22, texmath 0.12.2, skylighting 0.10.5,
citeproc 0.3.0.9, ipynb 0.1.0.1
pandoc-eqnos --version

pandoc-eqnos 2.5.0
johnallison0 commented 3 years ago

Same issue for me. I believe the issue was introduced due to a change in pandoc 2.11.3. I have reverted back to pandoc 2.11.2 and Word no longer complains. All pandoc releases after 2.11.2 results in the badly formatted XML output.

BRainynight commented 2 years ago

I've got "Xml parsing error" when I converted markdown to docx. After I removed <w:r><w:t> in variable bookmarkstart, and </w:t></w:r> in bookmarkend (pandoc_eqnos.py L215) , my markdown file can be converted to doxc successfully.

I found this solution by comparing with pandoc-fignos, code in these 2 projects has a little different:

This is in fignos:

        bookmarkstart = \
          RawBlock('openxml',
                   '<w:bookmarkStart w:id="0" w:name="%s"/>'
                   %attrs.id)
        bookmarkend = \
          RawBlock('openxml', '<w:bookmarkEnd w:id="0"/>')

But this is in eqnos:

        bookmarkstart = \
          RawInline('openxml',
                    '<w:bookmarkStart w:id="0" w:name="%s"/><w:r><w:t>'
                    %attrs.id)
        bookmarkend = \
          RawInline('openxml',
                    '</w:t></w:r><w:bookmarkEnd w:id="0"/>')
        ret = [bookmarkstart, AttrMath(*value), bookmarkend]

I'm not really sure what will affect after removing them, seems like the bookmark break and </w:p> pairs?