mwilliamson / python-mammoth

Convert Word documents (.docx files) to HTML
BSD 2-Clause "Simplified" License
811 stars 121 forks source link

Option to preserve existing element style_id as class name #40

Open nelliemckesson opened 7 years ago

nelliemckesson commented 7 years ago

Is there an option to preserve an existing .docx style_id (the cleaned up & normalized .docx style name) as a class name on the resulting HTML element, regardless of whether this is defined in a style map?

For example, I'd like to be able to convert various .docx files that use unknown and differing sets of styles, to HTML that preserves style names (whatever they may be) as HTML element class names (on both block and inline elements).

mwilliamson commented 7 years ago

Not at the moment, no.

GitBruno commented 7 years ago

Not without creating a style map. Luckily python-mammoth gives you excellent feedback so you can generate the style map automatically by running the conversion and catching the conversion messages like so:

for m in messages:
    if m.type == "warning":
        if 'Unrecognised ' and ' style' and ': ' in m.message:
            styleInfoString = m.message.encode("utf-8")
            styleInfo = styleInfoString.split(': ')
            styleInfo[0] = styleInfo[0].replace('Unrecognised ', '').replace(' style', '')
            styleInfo[1] = styleInfo[1].split(' (')[0]
            addToStyleMap(styleInfo)
        else:
            print 'stylemap gqenerator warning: ' + str(m.message)
    else:
print m

Then run the conversion again with your newly generated style map.

GitBruno commented 6 years ago

@mwilliamson Would you be interested in an auto-stylemap flag? I can do a PR for this. Otherwise we can close this issue.

--style-map=auto