html-to-text / node-html-to-text

Advanced html to text converter
Other
1.6k stars 224 forks source link

Cannot remove newlines between tags even with options #310

Open TrofinSorin opened 8 months ago

TrofinSorin commented 8 months ago

Minimal HTML example

<h1>1</h1>
<h1>2</h1>
<h1>3</h1>

Options

{
        wordwrap: false,
        preserveNewlines: false,
        selectors: [
          { selector: 'p', options: { leadingLineBreaks: 0, trailingLineBreaks: 0 } },
          { selector: 'h1', options: { leadingLineBreaks: 0, trailingLineBreaks: 0 } },
          { selector: 'h2', options: { leadingLineBreaks: 0, trailingLineBreaks: 0 } },
          { selector: 'h3', options: { leadingLineBreaks: 0, trailingLineBreaks: 0 } },
        ]
      }

Observed output Before the selector options there were 2 newlines. But with those options it's reduced to one newline.

1 (newline here) 2 (newline here) 3

Expected output

I want this output without newlines

1
2
3

The same as inside wysiwig, without newlines but H1

Version information


KillyMXI commented 8 months ago

These options will work:

{
  selectors: [
    { selector: 'h1', options: { leadingLineBreaks: 1, trailingLineBreaks: 1 } },
  ]
}

What happens with zero: The condition 0 || 2 equates to the default value of 2.

Notice that this counts line breaks, not empty lines. Zero line breaks would mean "continue on the same line" which doesn't make sense for block level tags.

I agree that this logic might be somewhat less intuitive though. It should probably be clamped to the closest valid value and sent a warning when misused...