jgm / pandoc

Universal markup converter
https://pandoc.org
Other
34.14k stars 3.35k forks source link

Support Word 2016 numbered equations #2851

Open davidanthoff opened 8 years ago

davidanthoff commented 8 years ago

Word 2016 adds support for equation numbers. After conversion with pandoc they end up as attached to the equation, separated by a #. For example, converting this file repo.docx with

pandoc repo.docx -o repo.md

I get this markdown

$$\begin{matrix}
x = x\#\left( 1 \right) \\
\end{matrix}$$

which is probably already not ideal. Converting that to a pdf via

pandoc repo.md -o repo-pandoc.pdf

results in this pdf repo-pandoc.pdf, which clearly is not how it should look. When I export the same Word file to a pdf directly with Word's build in mechanism, it looks like this repo-word.pdf.

Any chance that pandoc could support these native Word equation numbers?

ickc commented 8 years ago

You might want to take a look at a comment of @jgm in #2758:

...You will not get an equation number, but that's because pandoc doesn't currently have support for equation numbering...

I want that to be supported too but have no idea how it should be handled or how complicated it will be to handle it.

May be at least we should make a list of what formats support equation numbering first?

davidanthoff commented 8 years ago

I feel that if pandoc doesn't support equation numbers at this point, it should maybe just strip them out in the generated markdown? Or it could for example replace the #(1) part with a long space and then the equation number or something, so that at least visually it looks somewhat like the original?

jgm commented 8 years ago

Note: I have Word 2011 on my system. When I open your repo.docx, it looks exactly like repo-pandoc.pdf: you see a #(1). Here's what the internal xml looks like:

        <m:oMath>
          <m:eqArr>
            <m:eqArrPr>
              <m:maxDist m:val="1" />
              <m:ctrlPr>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math"
                  w:hAnsi="Cambria Math" />
                  <w:i />
                </w:rPr>
              </m:ctrlPr>
            </m:eqArrPr>
            <m:e>
              <m:r>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math"
                  w:hAnsi="Cambria Math" />
                </w:rPr>
                <m:t>x=x#</m:t>
              </m:r>
              <m:d>
                <m:dPr>
                  <m:ctrlPr>
                    <w:rPr>
                      <w:rFonts w:ascii="Cambria Math"
                      w:hAnsi="Cambria Math" />
                      <w:i />
                    </w:rPr>
                  </m:ctrlPr>
                </m:dPr>
                <m:e>
                  <m:r>
                    <w:rPr>
                      <w:rFonts w:ascii="Cambria Math"
                      w:hAnsi="Cambria Math" />
                    </w:rPr>
                    <m:t>1</m:t>
                  </m:r>
                </m:e>
              </m:d>
            </m:e>
          </m:eqArr>
        </m:oMath>

I don't see anything in there that looks like code for an equation number, but maybe they designed it so that earlier Word versions would just do the ugly thing you see. Well, pandoc behaves just like those earlier word versions.

davidanthoff commented 8 years ago

I think this whole thing might only be supported on Windows versions of Word, not the Mac versions? I believe Word 2013 correctly displays equation numbers as well. This article describes some of the way this seems handled internally in Word.

kendonB commented 3 years ago

This SO answer also offers a way to place Word's caption numbering in Word 2016 equations https://superuser.com/a/1121557/629656

gerritvreeman commented 1 year ago

I have Word 2021 (MacOS) and was trying to get equation numbering like this when converting from md to docx using a lua filter (below) to modify the xml.

local equation_counter = 0
local function number_equations(p)
  local hasDisplayMath = false
  -- Check if the paragraph has DisplayMath
  for _, el in ipairs(p.content) do
    if el.t == "Math" and el.mathtype == "DisplayMath" then
      hasDisplayMath = true
      equation_counter = equation_counter + 1
      break
    end
  end
  -- If the paragraph contains DisplayMath, replace it with the XML content
  if hasDisplayMath then
    local xml_content = [[
    <w:body>
      <w:p w:rsidR="004C5F70" w:rsidRPr="001877B8" w:rsidRDefault="001877B8">
        <m:oMathPara>
          <m:oMath>
            <m:eqArr>
              <m:eqArrPr>
                <m:maxDist m:val="1"/>
              </m:eqArrPr>
              <m:e>
                <m:r>
                  <m:t>]] .. p.content[1].text .. [[#</m:t>
                </m:r>
                <m:d>
                  <m:e>
                    <m:r>
                      <m:t>]] .. equation_counter .. [[</m:t>
                    </m:r>
                  </m:e>
                </m:d>
              </m:e>
            </m:eqArr>
          </m:oMath>
        </m:oMathPara>
      </w:p>
    </w:body>
    ]]
    return pandoc.RawBlock("openxml", xml_content)
  end
  return p
end

return {
    { Para = number_equations },
}

This seems to work great for numbering simple functions like $$y=mx+b$$ but for anything more complicated with fractions or symbols, like $$\sigma=\frac{1}{2}$$ it doesn't seem to work (see screenshot below). Returning a math element like pandoc.Math("DisplayMath", p.content[1].text) expands properly (obviously unnumbered though).

Screenshot 2023-09-13 at 9 46 58 AM

I imagine this is because I'm just dropping text into an xml block, but what I think I actually need is a math xml block where the p.content[1].text is currently being placed. However, I don't know how to generate that kind of xml from the math text, is that something that's even possible?

jgm commented 1 year ago

Yes, the texmath library (or executable) can be used to convert form TeX to OOML (Word's XML format for math). jgm/texmath