mwilliamson / python-mammoth

Convert Word documents (.docx files) to HTML
BSD 2-Clause "Simplified" License
811 stars 121 forks source link

Ignore images on conversion #92

Closed rushasdev closed 4 years ago

rushasdev commented 4 years ago

Hi, is it possible to ignore images while converting from .docx to .html?

To elaborate: Assume we have a docx file, which includes some text with some basic formats (bold,italic etc..) and images. I would like to ignore all images while keeping all the styles/formats during the conversion and generate an image-free html file.

I read the documentation but maybe I'm missing something. What would be the right way to achieve this?

Thanks in advance.

mwilliamson commented 4 years ago

The simplest way would probably be to post-process the HTML to remove any images. It's also possible to write an image converter that ignores images, although this isn't officially supported and therefore might not work in later versions:

def ignore_image(image):
    return []

mammoth.convert_to_html(docx_file, convert_image=ignore_image)
rushasdev commented 3 years ago

Sorry for the late reply. This solution works great as expected. Thank you.