mwilliamson / python-mammoth

Convert Word documents (.docx files) to HTML
BSD 2-Clause "Simplified" License
785 stars 121 forks source link

Loss of Heading Text from Bulleted list #120

Closed VenkatsQuest closed 2 years ago

VenkatsQuest commented 2 years ago

If you're reporting a bug or requesting a feature, please include: image

It's returning the following output

<ul><li><ul><li><ol><li><a id="OLE_LINK4"></a><a id="OLE_LINK3"></a><br /></li></ol></li></ul></li></ul><h2>Amazing Text</h2><p>Amazing text in the paragraph here</p><ul><li><ul><li><ol><li>Lets see how it goes</li><li>How this goes second</li><li>Sorry disappointed here</li></ol></li></ul></li></ul>

If you're reporting a bug, it's also useful to know what platform you're running on, including:

mwilliamson commented 2 years ago

Could you provide the original document? There's not much I can do without it.

VenkatsQuest commented 2 years ago

Here goes it is Test.docx

VenkatsQuest commented 2 years ago

@mwilliamson any help / suggestion on this issue ?

mwilliamson commented 2 years ago

I'll take a look when I get a chance. I'd ask that you don't continue to ping me here or on Twitter: doing so won't make me look any faster.

VenkatsQuest commented 2 years ago

Any Suggestions ?

mwilliamson commented 2 years ago

It looks like the text for each item is set in the w:lvl element:

<w:lvlText w:val="REQ %1.%2.%3"/>

Mammoth maps lists in the original document to lists in HTML. Since it seems like you want to have the actual text for the numbering explicitly in the output (rather than set by the HTML lists), your best bet is probably to map those items to a specific CSS class, and then run a post-processor to insert any explicit text/numbering you want.