privateOmega / html-to-docx

HTML to DOCX converter
MIT License
378 stars 141 forks source link

Heading lineRule not set properly: h2 and h1 on multiple lines overlap #104

Open svandegar opened 2 years ago

svandegar commented 2 years ago

When H1 or H2 covers multiple lines, the space between two lines is too small. Second line touches or overlaps.

CleanShot 2021-11-22 at 18 25 14@2x

Steps to reproduce:

  1. Generate a DOCX with a long h1 or h2, on multiple lines.
  2. Convert if to PDF with libreoffice cli: soffice --headless --convert-to pdf test.docx

If I open the document with Word (16.55) do any change and save it, then I can convert it to PDF without any issue.

I suspect there must be some incompatibility in the way to document is generated by this lib.

Here's the h2 part of document.xml generated by this lib:

<w:p>
      <w:pPr>
        <w:pStyle w:val="Heading2"/>
        <w:spacing w:lineRule="exact"/>
      </w:pPr>
      <w:r>
        <w:rPr/>
        <w:t xml:space="preserve">I AM A VERY VERY VERY VERYVERY VERYVERY VERYVERY VERY LONG H2</w:t>
      </w:r>
    </w:p>

And same bloc on the same document after a new save in Word:

<w:p>
      <w:pPr>
        <w:pStyle w:val="Heading2"/>
      </w:pPr>
      <w:r>
        <w:rPr/>
        <w:t xml:space="preserve">I AM A VERY VERY VERY VERYVERY VERYVERY VERYVERY VERY LONG H2</w:t>
      </w:r>
    </w:p>

The only difference is the <w:spacing w:lineRule="exact"/> param.

From what I understand fron the OpenOffice XML documentation, it should be set to "auto" to avoid this problem.

I'll be happy to submit a PR but since I'm not an expert in OpenXML, I'd prefer if someone could validate my theory before I work on this PR.

For reference, here's where lineRule is set: https://github.com/privateOmega/html-to-docx/blob/5e15d93fa5ae8fce0e63753244441ce552878642/src/helpers/xml-builder.js#L626

svandegar commented 2 years ago

Has anyone seen this issue? I'm happy to work on a PR, just need to know if it will be merged.

privateOmega commented 2 years ago

@svandegar Yes I can merge it, thanks. Could you please attach sample inputs here as well?

shk-webpr commented 2 years ago

Hi!

Could you please attach sample inputs here as well?

@privateOmega Do you mean html strings? Any with h1 should work since an upper part of letters disappears.

<h1>lorem ipsum</h1>

image