onizet / html2openxml

Html2OpenXml is a small .Net library that convert simple or advanced HTML to plain OpenXml components. This program has started in 2009, initially to convert user's comments into templated Word.
MIT License
321 stars 109 forks source link

Unorder list transformed into order list #82

Closed ericadcg closed 3 months ago

ericadcg commented 4 years ago

Hello. I have an unorder list (\

nascimento3 commented 4 years ago

Hi,

I'm having exactly the same problem...

@onizet any ideas?

Thanks!

onizet commented 4 years ago

Do you use a fresh new document or insert into an existing one? I ran the Demo project and do not see any issues regarding <ul>. Could you post more details for troubleshooting?

ericadcg commented 4 years ago

Thank you for your reply. The problem happens when there is a list inside a list, example:

<table>
    <tbody>
        <tr>
            <th>
            Header
            </th>
        </tr>
        <tr>
            <td> 
                <ul>
                    <li>This is a test item - First List
                        <ul>
                            <li>Second List</li>
                        </ul>
                    </li>
                </ul>
            </td>
        </tr>   
    </tbody>
</table>

If I don't have the html shown above, everything is ok. But if I do, that list is ok, but all the following unorded lists are shown as orderd. Example:

<table>
    <tbody>
        <tr>
            <th>
            Header
            </th>
        </tr>
        <tr>
            <td> 
                <ul>
                    <li>This is a test item - First List
                        <ul>
                            <li>Second List</li>
                        </ul>
                    </li>
                </ul>
            </td>
        </tr>   
    </tbody>
</table>

<table>
  <tbody>
    <tr>
        <td>Header 2</td>
    </tr>
    <tr>
        <td>
            <ul>
                <li>Test</li>
            </ul>
            <table>
                <tr>
                    <td style='padding-left: 30px;'>Test 2 </td>
                </tr>
            </table>
        </td>
    </tr>
  </tbody>
</table>

Here is an image of the word output of the html from above:

WordOutputExample

ericadcg commented 4 years ago

Here is a simpler example (with less tables involved):

<html>
    <body>
<p>Paragraph<p>
<ul>
    <li> This is a test item </li>
    <ul>
        <li> Second List</li>
    </ul>   
</ul>

<p>Paragraph 2</p>
<ul>
    <li>New ul list - item 1</li>
    <li>item 2</li>
</ul>

</body>
</html>

And this is the word output for the html above: WordOutputExample2

Also, I'm using a fresh new document.

Thank you again!

dhavalgajera commented 3 years ago

In NumberingListStyleCollection

possible fix can be following:

    public void EndList(bool popInstances = true)
        {
            levelDepth--;

            if (levelDepth > 0 && popInstances)
                numInstances.Pop();  // decrement for nested list

            firstItem = true;
        }
sbowler commented 3 years ago

I ran into this issue as well throwing together a couple different lists and nested lists to test out the HtmlConverter. This seems to result in all lists being styled as ordered lists for some reason. Maybe this is a different bug, but sounds like maybe it is the same issue. Here is the example HTML and an example resulting paragraph. Thanks for the work on this library.

<ul class="alternate" type="square">\n\t
  <li>dfdf\n\t
    <ul class="alternate" type="square">\n\t\t
      <li>dfdfa</li>\n\t
    </ul>\n\t
  </li>\n\t
  <li>dfd</li>\n
</ul>
<br/>\n
<ul>\n\t
  <li>foof\n\t
    <ul>\n\t\t
      <li>fkfjf</li>\n\t
    </ul>\n\t
  </li>\n\t
  <li>adsf</li>\n
</ul>
<br/>\n
<ol>\n\t
  <li>num 1</li>\n\t
  <li>num 2</li>\n
</ol>\n
<w:p xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:pPr>
    <w:pStyle w:val="ListParagraph"/>
    <w:numPr>
        <w:ilvl w:val="0"/>
        <w:numId w:val="2"/>
    </w:numPr>
  </w:pPr>
  <w:r>
    <w:t xml:space="preserve">dfdf</w:t>
  </w:r>
</w:p>
sbowler commented 3 years ago

So I'm realizing that I think this has to do with me using the converter to convert HTML and then inject it into another document. I think the document is then missing the number styling id that the parser tries to add to the other document and defaults to a numbered list or something.

Update: I refactored the code and now using the final documents main part for the converting so it should be able to adding the correct matching styling and other items to the document. However, the document and list formatting still look the same with the wrong numbering format so..

sbowler commented 3 years ago

I finally got this in a state that seems to be working properly. Here is the gist of the code if it helps anyone having similar issues. In my case I was looking for certain things and then wanting to replace that paragraph with the HTML converted stuff. I've simplified it a bit for more generic code.

using (WordprocessingDocument package = WordprocessingDocument.Open(outputFilePath, true))
{
  MainDocumentPart mainPart = package.MainDocumentPart;
  if (mainPart == null)
  {
    // You may just want to create an empty document here for your case
    throw new Exception("Document appears to be empty");
  }

  HtmlConverter converter = new HtmlConverter(mainPart);
  foreach (var paragraph in mainPart.Document.Body.Descendants<DocumentFormat.OpenXml.Wordprocessing.Paragraph>())
  {
    if (paragraph.InnerText.Contains("FOO"))
    {
      foreach (var node in converter.Parse(foo.HtmlStep).Reverse())
      {
        paragraph.InsertAfterSelf(new Paragraph(node.OuterXml));
      }

      paragraph.Remove();
    }
  }

  mainPart.Document.Save();
}