mwilliamson / python-mammoth

Convert Word documents (.docx files) to HTML
BSD 2-Clause "Simplified" License
810 stars 121 forks source link

implement xml properties and html attributes for "transform" usage #113

Closed caramdache closed 2 years ago

caramdache commented 2 years ago

From https://github.com/zt50tz/python-mammoth/commit/302f55a861ebca4171ab3ae786f9df987c8edbbb

This allows to write transforms such as: https://github.com/zt50tz/python-mammoth/blob/master/recipes/html_attributes.py

See issue https://github.com/mwilliamson/python-mammoth/issues/63 (support for colour)

mwilliamson commented 2 years ago

Exposing the raw XML has come up before (although possibly in the context of the JavaScript implementation). While I'm not necessarily opposed to it in principle, I'm not convinced exposing the current data structures is the right answer.

caramdache commented 2 years ago

Exposing the raw XML has come up before (although possibly in the context of the JavaScript implementation). While I'm not necessarily opposed to it in principle, I'm not convinced exposing the current data structures is the right answer.

What could then be the correct API? I think having an extension API would allow mammoth to address further use cases, without the complexity and risk of maintaining patches outside the repo. I am willing to invest time implementing a solution.