jgm / texmath

A Haskell library for converting LaTeX math to MathML.
GNU General Public License v2.0
321 stars 66 forks source link

Big Brackets in Latex (in Markdown, .md) are Converted to Small Brackets in Microsoft Word (.docx) #140

Closed newzealandpaul closed 2 years ago

newzealandpaul commented 5 years ago

Larger sized Latex math brackets, such as \bigg, when embedded in markdown are converted into regular sized brackets when the conversion target is MS Word (.docx). This is not the same for PDF, which render correctly.

Microsoft Word does support large sized brackets, so I am unsure if this is expected behavior or a bug. It seems like a bug from a user perspective but I don't know the hoops pandoc has to jump through when creating .docx documents.

Here is an example with markdown file hello.md:

hello world

$\bigg( (\sigma^2+1)^3 \bigg)$

Converted using:

pandoc -s hello.md -o hello.docx
pandoc -s hello.md -o hello.pdf

The equation renders correctly in PDF:

pdfeq

But renders all the brackets the same size in Microsoft Word:

wordeq

This is what I would have expected (edited the pandoc output .docx and increased the font size of the outer brackets so they are larger):

Screen Shot 2019-05-20 at 5 36 20 PM

My pandoc version:

$ pandoc --version
pandoc 2.7.2
Compiled with pandoc-types 1.17.5.4, texmath 0.11.2.2, skylighting 0.7.7
Default user data directory: /Users/paul/.local/share/pandoc or /Users/paul/.pandoc
Copyright (C) 2006-2019 John MacFarlane
Web:  http://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.

Edit: I originally tried it with pandoc 1.19.2.1 (which happened to be in my path through anaconda) by mistake. The same problem occurs with the latest pandoc 2.7.2 release.

mb21 commented 5 years ago

Can you post a minimal expected docx file, or its XML contents?

newzealandpaul commented 5 years ago

@mb21 sure, something like this is what I would have expected:

Expected output:

hello-expected.docx

Actual output:

hello-pandoc output.docx

Source file (file ext. changed to txt so I could upload it to github):

hello.md.txt

Even if there were just two or three levels of "big" parenthesis, it would be an improvement over just one level. I am not a latex expert, so I am not sure how many levels there are.

mb21 commented 5 years ago

Thanks, they seem to simply increase the font-size for those braces. Looks like we could do the same...

    <w:p w14:paraId="1891ECC9" w14:textId="0C56A3EF" w:rsidR="005F1858" w:rsidRDefault="00AC0689">
      <w:pPr>
        <w:pStyle w:val="BodyText"/>
      </w:pPr>
      <m:oMathPara>
        <m:oMath>
          <m:r>
            <w:rPr>
              <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
              <w:sz w:val="36"/>
              <w:szCs w:val="36"/>
            </w:rPr>
            <m:t>(</m:t>
          </m:r>
          <m:r>
            <w:rPr>
              <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
            </w:rPr>
            <m:t xml:space="preserve"/>
          </m:r>
          <w:bookmarkStart w:id="0" w:name="_GoBack"/>
          <w:bookmarkEnd w:id="0"/>
          <m:r>
            <w:rPr>
              <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
            </w:rPr>
            <m:t>(</m:t>
          </m:r>
          <m:sSup>
            <m:sSupPr>
              <m:ctrlPr>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
                </w:rPr>
              </m:ctrlPr>
            </m:sSupPr>
            <m:e>
              <m:r>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
                </w:rPr>
                <m:t>σ</m:t>
              </m:r>
            </m:e>
            <m:sup>
              <m:r>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
                </w:rPr>
                <m:t>2</m:t>
              </m:r>
            </m:sup>
          </m:sSup>
          <m:r>
            <w:rPr>
              <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
            </w:rPr>
            <m:t>+1</m:t>
          </m:r>
          <m:sSup>
            <m:sSupPr>
              <m:ctrlPr>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
                </w:rPr>
              </m:ctrlPr>
            </m:sSupPr>
            <m:e>
              <m:r>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
                </w:rPr>
                <m:t>)</m:t>
              </m:r>
            </m:e>
            <m:sup>
              <m:r>
                <w:rPr>
                  <w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
                </w:rPr>
                <m:t>3</m:t>
              </m:r>
            </m:sup>
          </m:sSup>
          <m:r>
            <m:rPr>
              <m:sty m:val="p"/>
            </m:rPr>
            <w:rPr>
              <w:rFonts w:ascii="Cambria Math" w:eastAsiaTheme="minorEastAsia" w:hAnsi="Cambria Math"/>
              <w:sz w:val="36"/>
              <w:szCs w:val="36"/>
            </w:rPr>
            <m:t>)</m:t>
          </m:r>
        </m:oMath>
      </m:oMathPara>
    </w:p>
newzealandpaul commented 5 years ago

Yes thats is what I did manually and it's a huge improvement.

.docx is terrible, and I wish I could practically use PDFs, but for those of us more or less forced to exchange Word documents with math, it's a huge improvement for non-trivial equations.

But regardless, thanks to everyone who works on Pandoc, it is such an amazing project.

mb21 commented 5 years ago

Ah, there is no special GUI-button to create big braces?

btw. @jgm should probably move this issue to https://github.com/jgm/texmath

newzealandpaul commented 5 years ago

Not that I know of. It seems the recommended method is changing the font size.

jgm commented 5 years ago

The issue (in texmath, not pandoc proper) is that we do not yet support the texmath EScaled element in the omml writer.

   EScaled _ x      -> showExp props x -- no support for scaler?

But it's not clear how to do this. We could set properties for a particular font size, but which size? We don't know what the default stylesheet for the document will be, so choosing say '36' as in the example above wouldn't be right. What we need is a generic way to specify "larger" without specifying the point size.

Meimax commented 2 years ago

In word there is a GUI-Button for size changing Brackets, but only in equation environments. In the Equations tab, under structure there is a submenu "Brackets": tempsnip

In the xml, word uses a m:d element as the gui brackets:

22.1.2.24 d (Delimiter Object) This element specifies the delimiter object, consisting of opening and closing delimiters (such as parentheses, braces, brackets, and vertical bars), and an element contained inside. The delimiter may have more than one element, with a designated separator character between each element.

<m:d>
<m:dPr>
<m:ctrlPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
<w:i/>
</w:rPr>
</m:ctrlPr>
</m:dPr>
<m:e>
<m:sSup>
<m:sSupPr>
<m:ctrlPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
<w:i/>
</w:rPr>
</m:ctrlPr>
</m:sSupPr>
<m:e>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
</w:rPr>
<m:t>p</m:t>
</m:r>
</m:e>
<m:sup>
<m:sSup>
<m:sSupPr>
<m:ctrlPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
<w:i/>
</w:rPr>
</m:ctrlPr>
</m:sSupPr>
<m:e>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
</w:rPr>
<m:t>x</m:t>
</m:r>
</m:e>
<m:sup>
<m:sSup>
<m:sSupPr>
<m:ctrlPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
<w:i/>
</w:rPr>
</m:ctrlPr>
</m:sSupPr>
<m:e>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
</w:rPr>
<m:t>x</m:t>
</m:r>
</m:e>
<m:sup>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:hAnsi="Cambria Math"/>
</w:rPr>
<m:t>x</m:t>
</m:r>
</m:sup>
</m:sSup>
</m:sup>
</m:sSup>
</m:sup>
</m:sSup>
</m:e>
</m:d>

Keyboard Brackets:

<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:eastAsiaTheme="minorEastAsia" w:hAnsi="Cambria Math"/>
<w:lang w:val="de-DE"/>
</w:rPr>
<m:t xml:space="preserve">       (</m:t>
</m:r>
<m:sSup>
<m:sSupPr>
<m:ctrlPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:eastAsiaTheme="minorEastAsia" w:hAnsi="Cambria Math"/>
<w:i/>
<w:lang w:val="de-DE"/>
</w:rPr>
</m:ctrlPr>
</m:sSupPr>
<m:e>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:eastAsiaTheme="minorEastAsia" w:hAnsi="Cambria Math"/>
<w:lang w:val="de-DE"/>
</w:rPr>
<m:t>p</m:t>
</m:r>
</m:e>
<m:sup>
<m:sSup>
<m:sSupPr>
<m:ctrlPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:eastAsiaTheme="minorEastAsia" w:hAnsi="Cambria Math"/>
<w:i/>
<w:lang w:val="de-DE"/>
</w:rPr>
</m:ctrlPr>
</m:sSupPr>
<m:e>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:eastAsiaTheme="minorEastAsia" w:hAnsi="Cambria Math"/>
<w:lang w:val="de-DE"/>
</w:rPr>
<m:t>x</m:t>
</m:r>
</m:e>
<m:sup>
<m:sSup>
<m:sSupPr>
<m:ctrlPr>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:eastAsiaTheme="minorEastAsia" w:hAnsi="Cambria Math"/>
<w:i/>
<w:lang w:val="de-DE"/>
</w:rPr>
</m:ctrlPr>
</m:sSupPr>
<m:e>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:eastAsiaTheme="minorEastAsia" w:hAnsi="Cambria Math"/>
<w:lang w:val="de-DE"/>
</w:rPr>
<m:t>x</m:t>
</m:r>
</m:e>
<m:sup>
<m:r>
<w:rPr>
<w:rFonts w:ascii="Cambria Math" w:eastAsiaTheme="minorEastAsia" w:hAnsi="Cambria Math"/>
<w:lang w:val="de-DE"/>
</w:rPr>
<m:t>x</m:t>
</m:r>
</m:sup>
</m:sSup>
</m:sup>
</m:sSup>
</m:sup>
</m:sSup>
<m:r>

Also for squared brackets (or any others), there is a m:dPr element:

<m:d>
 <m:dPr>
 <m:begChr m:val="["/>
 <m:endChr m:val="]"/>
 </m:dPr>
 <m:e>
 <m:r>
 <m:t>a+b</m:t>
 </m:r>
 </m:e>
</m:d>

22.1.2.31 dPr (Delimiter Properties) This element specifies the properties of d, including the enclosing and separating characters, and the properties that affect the shape of the delimiters.

test.docx

jgm commented 2 years ago

Great. Our OMML writer already supports m:d and uses it for the EDelimited type. So, if we parsed $\bigg( (\sigma^2+1)^3 \bigg)$ as

[EDelimited "(" ")" [Right (ESuper (EDelimited "(" ")" [Right (ESuper (EIdentifier "\963") (ENumber "2")),Right (ESymbol Bin "+"),Right (ENumber "1")]) (ENumber "3"))]]

instead of the current

[EScaled (12 % 5) (ESymbol Open "("),ESuper (EDelimited "(" ")" [Right (ESuper (EIdentifier "\963") (ENumber "2")),Right (ESymbol Bin "+"),Right (ENumber "1")]) (ENumber "3"),EScaled (12 % 5) (ESymbol Close ")")]

then it sounds like we'd get the right results. Verifying this:

% texmath -f native -t omml    
[EDelimited "(" ")" [Right (ESuper (EDelimited "(" ")" [Right (ESuper (EIdentifier "\963") (ENumber "2")),Right (ESymbol Bin "+"),Right (ENumber "1")]) (ENumber "3"))]]
^D
<m:oMathPara>
  <m:oMathParaPr>
    <m:jc m:val="center" />
  </m:oMathParaPr>
  <m:oMath>
    <m:d>
      <m:dPr>
        <m:begChr m:val="(" />
        <m:endChr m:val=")" />
        <m:sepChr m:val="" />
        <m:grow />
      </m:dPr>
      <m:e>
        <m:sSup>
          <m:e>
            <m:d>
              <m:dPr>
                <m:begChr m:val="(" />
                <m:endChr m:val=")" />
                <m:sepChr m:val="" />
                <m:grow />
              </m:dPr>
              <m:e>
                <m:sSup>
                  <m:e>
                    <m:r>
                      <m:t>σ</m:t>
                    </m:r>
                  </m:e>
                  <m:sup>
                    <m:r>
                      <m:t>2</m:t>
                    </m:r>
                  </m:sup>
                </m:sSup>
                <m:r>
                  <m:rPr>
                    <m:sty m:val="p" />
                  </m:rPr>
                  <m:t>+</m:t>
                </m:r>
                <m:r>
                  <m:t>1</m:t>
                </m:r>
              </m:e>
            </m:d>
          </m:e>
          <m:sup>
            <m:r>
              <m:t>3</m:t>
            </m:r>
          </m:sup>
        </m:sSup>
      </m:e>
    </m:d>
  </m:oMath>
</m:oMathPara>

However, when I open the result in Word I see

Screen Shot 2022-02-23 at 9 37 28 AM

so this isn't giving us the big parentheses. Nor do I see how we'd indicate differences between e.g. \big and \bigg.

Meimax commented 2 years ago

I think thats a word "feature", if I open this example in word and add another superscript, the parenthesis are getting bigger

jgm commented 2 years ago

So, you think the change outlined above would be a good one? If so, there's a question how to implement it. I think the best approach might be not to touch the tex reader, but rather to do a transformation of the AST in the omml writer, looking for the pattern `EScaled (ESymbol Open op), ..., EScaled (ESymbol Close cl)] and replacing with EDelimited?

Meimax commented 2 years ago

I think the change mirrors exactly how Word would parse it, so I think it would be a good solution. Since it is only a omml bug and everything else works just fine, changing it in the omml writer has probably the fewest side effects.