mwilliamson / python-mammoth

Convert Word documents (.docx files) to HTML
BSD 2-Clause "Simplified" License
811 stars 121 forks source link

Pre converted html tags in doc file #49

Closed anvinj closed 7 years ago

anvinj commented 7 years ago

Is there any option to ignore the html conversion of pre-converted (html statements) in the input docx file. ie if the input file contains few html tags, can we avoid the conversion for those statements.

mwilliamson commented 7 years ago

Do you mean HTML that appears in w:altChunk elements (so <strong>hello</strong> would appear as the text "hello" in bold), or HTML that literally actually appears in the document (so the text would literally be <strong>hello</strong>)? The former isn't supported by Mammoth at the moment, so there's no need for an option. The only way you'd be able to strip out the latter would be to use a document transform, as described in the docs.

anvinj commented 7 years ago

Following is a part of doc am trying to convert to html.

Options: (a) Acidic : NH3 < PH3 < AsH3

Here i have to retain the html tags while converting. Any help will be appreciated. Thanks in advance

mwilliamson commented 7 years ago

I'm afraid I don't follow. Looking at your example, I think everything there should be supported. For instance, subscript is already. If you could provide an example document and the output you expect, then that would help.