mwilliamson / python-mammoth

Convert Word documents (.docx files) to HTML
BSD 2-Clause "Simplified" License
785 stars 121 forks source link

Cross references getting underlined incorrectly #119

Closed deltamacht closed 2 years ago

deltamacht commented 2 years ago

I'm reporting what I think is a bug when the document contains cross-references and one is using a "u => u" style map. Most of the time things look fine, but occasionally I've seen odd underline-related behavior. For example, see this one-line Word document. unwanted_underline.docx. If you process this with simple u style map (screenshot of a notebook attached)

Screen Shot 2022-02-04 at 3 15 58 PM

you'll find that the number 5.1 gets underlined. However, I see no evidence that this number should get underlined in the actual document. Usually when I encounter similar situations the link will be created but it's content will not be underlined unless it was explicitly underlined or it was part of it's Word style.

Thoughts?

This was run on a Docker image based on python:3.9-buster.

mwilliamson commented 2 years ago

Looking at the XML, there's an underline in the run properties:

      <w:hyperlink w:anchor="_bookmark13" w:history="1">
        <w:r w:rsidR="005F4526" w:rsidRPr="00097E2E">
          <w:rPr>
            <w:i/>
            <w:u w:color="0000FF"/>
          </w:rPr>
          <w:t>5.1</w:t>
        </w:r>
      </w:hyperlink>

so I'm not sure why it wouldn't be underlined in Word (unfortunately, I don't have a copy to hand right now)

mwilliamson commented 2 years ago

Ah, probably because there's no w:val attribute, whereas I think Mammoth checks for specific values (e.g. none) and doesn't handle the case of no attribute at all.

mwilliamson commented 2 years ago

This should now be addressed in the latest release. Thanks for the report!