onizet / html2openxml

Html2OpenXml is a small .Net library that convert simple or advanced HTML to plain OpenXml components. This program has started in 2009, initially to convert user's comments from SharePoint to Word.
MIT License
306 stars 107 forks source link

Number List continues to increment when parsing multiple chunks of separate HTML. #57

Closed devrony closed 2 months ago

devrony commented 5 years ago

Hi, I have a web application that uses TinyMCE Editor where users can add Rich Text content which is saved as HTML. The web form has multiple Rich Text fields throughout the form. I'm generating a Word.docx from all content entered by the users from the web form. The document is layed out with different sections where the field content in injected. I'm using the Html2OpenXml library to inject the HTML into parts of the Word.docx file. The issue I'm running into is the Number List continues to increment (1, 2, 3 4, 5, 6 7, 8, 9) even when I have created new sections in the document which contain a new Number List HTML chunk. I'm expecting output like (1, 2, 3 1, 2, 3 1, 2, 3) in the HTML.

My output in Word document comes out like this. I'm expecting the numbers to start over in each HTML chunk. Any help would be greatly appreciated.

HTML Chunk 1 1.) Number Parent Item 1 2.) Number Parent Item 2

HTML Chunk 2 3.) Number Parent Item 1 4.) Number Parent Item 2

Here is the HTML sample that I'm pasting in each TinyMCE Rich Text editor.

<div>
    <ol>
        <li>Number Parent Item 1</li>
        <li>Number Parent Item 2</li>
    </ol>
</div>

Here is the code I'm using to create the HTML parts. This method gets called for each section that I'm rendering in the Word document.

        private List<Paragraph> ConvertHtmlToOpenXML(string htmlText)
        {
            // Must return at least 1 paragraph
            if (string.IsNullOrEmpty(htmlText))
            {
                List<Paragraph> paragraphs = new List<Paragraph>();
                paragraphs.Add(new Paragraph());
                return paragraphs;
            }

            // Temporarily create new document for HTML conversion and then retrieve the generated paragraphs and 
            // append to original document.
            using (var tmpGeneratedDocument = new System.IO.MemoryStream())
            {
                var tmpPackage = WordprocessingDocument.Create(tmpGeneratedDocument, WordprocessingDocumentType.Document);

                var tmpMainDocumentPart1 = tmpPackage.MainDocumentPart;
                if (tmpMainDocumentPart1 == null)
                {
                    tmpMainDocumentPart1 = tmpPackage.AddMainDocumentPart();
                    new Document(new Body()).Save(tmpMainDocumentPart1);
                }

                var htmlConverter = new HtmlConverter(tmpMainDocumentPart1);

                // ParseHtml will automatically append to temp document
                htmlConverter.ParseHtml(htmlText);
                tmpMainDocumentPart1.Document.Save();

                tmpPackage.Close();
                tmpGeneratedDocument.Close();

                // Return parsed HTML paragraphs
                return tmpMainDocumentPart1.Document.Body.Descendants<Paragraph>().ToList();
            }
        }
caiotoptal commented 3 years ago

@devrony did you have any luck with this issue at all?

devrony commented 3 years ago

It's been a while. Let me check the code and see what I did. I likely found a workaround or tried something else. Give me about an hour.

devrony commented 3 years ago

Unfortunately, my workaround was to simply turn off ordered list in my TinyMCE editor for users. Luckily, the client was okay with this for now, but I believe eventually I'll need to determine how to fix. I'm currently on HtmlToOpenXml v2.0.3. I see there is a 2.1.0 version available on Nuget, but have not tried updating yet.

caiotoptal commented 3 years ago

@devrony I solved it late at night by just rewriting this Collections/Numbering thing from scratch. It sucks now but worked lol. Hopefully there's an update soon. Thanks for the feedback!

onizet commented 4 months ago

In v3, you will be able to use converter.ContineNumbering = false