mwilliamson / python-mammoth

Convert Word documents (.docx files) to HTML
BSD 2-Clause "Simplified" License
810 stars 121 forks source link

Unordered list in paragraph with custom style including numbering cannot be retrieved properly ? #141

Closed yldoctrine closed 7 months ago

yldoctrine commented 8 months ago

Hello, thanks for your work !

I have an issue with a docx that comes with a custom style that includes some numbering rules. In the docx there are some paragraphs with this style that contains some unordered lists. When I try to convert to html, the unordered list gets merged into the ordered list of the paragraph

I use the following style_map

p:unordered-list(1) => ul > li:fresh p:ordered-list(1) => ol > li:fresh

Here are some example files to reproduce the issue ordered_list.docx ordered_list.txt expected_ordered_list.txt

Inside the docx, in the numbering.xml I found the following which I think explains the problem

<--- This seems to be the problem <--- this as well

Then in the library's code we end up here, applying the "level" of the style and loose the list in the document https://github.com/mwilliamson/python-mammoth/blob/master/mammoth/docx/body_xml.py#L242

I'm not sure on how to fix this, thanks for your insight on this issue

Running on Python 3.10 on MacOS Sonoma 14.3.1

mwilliamson commented 8 months ago

Hmm, it's probably the case that the precedence is the wrong way round: numbering properties applied directly to a paragraph should supersede those applied to the paragraph style.

mwilliamson commented 7 months ago

This should now be fixed in 1.7.1.