mwilliamson / python-mammoth

Convert Word documents (.docx files) to HTML
BSD 2-Clause "Simplified" License
812 stars 121 forks source link

Question: Is it possible to support/convert Word shapes/drawings? #84

Open TheKalpit opened 4 years ago

TheKalpit commented 4 years ago

I've been looking for a solution which can convert shapes/drawings in docx to standard image formats (jpeg/png). Ideally, I'd like it to have support for Python on Ubuntu, but at this point any solution would work.

An excerpt of w:r from docx xml of a sample shape: https://gist.github.com/TheKalpit/323f220f55d509ede1fda8b032229b17

What are my options?

mwilliamson commented 4 years ago

Mammoth doesn't support conversion of shapes. LibreOffice might be able to do some automated conversion.

TheKalpit commented 4 years ago

I've looked into LibreOffice, and it does seem to do something. I'm able to convert the whole page to a png, but not just the drawing/shape part independently. I've also tried some other OSS tools, but no luck so far.

I'm willing to add this enhancement to mammoth, but cannot find any solution. Could you please point me into some direction or resource which can help doing this?

mwilliamson commented 4 years ago

I usually try to work out the XML for documents from a combination of examples and the docx spec (alas, real-world documents and the spec don't always match up). I'd be happy to be proven wrong, but my feeling is that the markup we'd need to support for shapes to do a decent job is quite large, and probably suggests maintenance overhead that I wouldn't want to add to Mammoth.