mwilliamson / mammoth.js

Convert Word documents (.docx files) to HTML
BSD 2-Clause "Simplified" License
4.88k stars 529 forks source link

Wrong Numbering for Resuming Lists #413

Open jan-schweiger opened 3 weeks ago

jan-schweiger commented 3 weeks ago

Description / Minimal Example:

We have encountered a severe limitation that still seems to be unresolved.

As you can see in the image below, we have a Word document with a numbered list that is intersected by some other content.

image

Mammoth correctly splits it into two html lists, but doesn't give us any information that these two lists essentially belong together. Therefore, we inevitable end up with a wrong numbering:

image

Resulting HTML code:

<ol>
    <li>First Item</li>
    <li>Second Item</li>
</ol>
<p>Paragraph between</>
<ol>
    <li>Third Item</li>
</ol>

The original Word file: Resuming List.docx

Desired solution:

Solution A: List Numbering-ID

<ol data-numbering-id="0">
    <li>First Item</li>
    <li>Second Item</li>
</ol>
<p>Paragraph between</>
<ol data-numbering-id="0">
    <li>Third Item</li>
</ol>

This solution would be best when automatically processing the html data. It is also the solution that is most preferred by the community what I have seen in other issues.

This doesn't need to be standard behaviour. It is totally fine, if the data-numbering-id is only added when specifying an option. Or it is even fine, if we can do it ourselves by adding a transform-function.

Solution B: Start-attribute

<ol>
    <li>First Item</li>
    <li>Second Item</li>
</ol>
<p>Paragraph between</>
<ol start="3">
    <li>Third Item</li>
</ol>

As suggested by many others, the other alternative would be leveraging the start attribute, which is an out-of-box html-feature. This way, the html is correctly rendered without any post-processing.

Solution C: Accessing the Word numbering

Another alternative would be, if we could access the numbering of a list item in a transform-function and add it as a html attribute. Currently we cannot access any numbering information in the transform function. I have tried that already.

Preference:

I believe solution A & B would work best for most people. However, any solution would be fine. At the moment there is however unfortunately no solution at all, which currently poses a big problem for us.

Verision / Environment:

mammoth 1.8.0 in Node.js

Related Issues:

Thank you very much in advance for considering this important improvement!!

jan-schweiger commented 3 weeks ago

Hi @mwilliamson, we will test your package in production in mid-September. Since you are working on this package free of charge, I would like to support you. For every issue you are able to resolve for us, I will sponsor you $100. I hope, I will be able to convince my company in the future to sponsor you as well, if the production test is successful. Thank you very much for all your effort you put into this project.

nero-nazok commented 2 weeks ago

I totally agree with you. I think this is one of the most requested functionalities I have seen for Mammoth. It would be really great to have that finally resolved and added to Mammoth.

Please have a look at this one @mwilliamson. Thanks!!

mwilliamson commented 2 weeks ago

Unfortunately, due to the way that HTML is generated, this isn't entirely straightforward to accomplish, so I'm not sure when I'd get the time to think through what the right approach would be.

Since you are working on this package free of charge, I would like to support you. For every issue you are able to resolve for us, I will sponsor you $100.

Thanks for the offer. It depends on the size of the feature, but I'm afraid $100 probably isn't enough to make any difference to whether I work on a feature or not.

tanja-kovakic commented 2 weeks ago

Is there maybe a solution that is easier for you to implement @mwilliamson?

Maybe you could provide a way to access the numbering of each list item? That way we could detect resuming lists ourselves.

However, currently there is just no way of detecting a resuming list and we end up with a wrong html. I think a lot of Mammoth users are in need of this feature. I have been following it, since it was originally opened in https://github.com/mwilliamson/mammoth.js/issues/267 three years ago. Any way of detecting a resuming list would be fine at this point.

tanja-kovakic commented 2 weeks ago

I would also like to sponsor an additional 100$, to highlight how important this issue is. Even though it might not be much :)