html-to-text / node-html-to-text

Advanced html to text converter
Other
1.61k stars 223 forks source link

Entity after script tag results in HTML being copied. #285

Closed galenhuntington closed 1 year ago

galenhuntington commented 1 year ago

Minimal HTML example

<style>a{}</style>&apos;<br/>

Observed output

'<br/>

Expected output

'

Version information


The problem seems to arise with either a <script> or <style> block, followed by text with an entity (&blah;). HTML afterwards gets copied over literally without any further text conversion.

KillyMXI commented 1 year ago

Thank you for the report.

It's a htmlparser2 issue. I opened the issue upstream: https://github.com/fb55/htmlparser2/issues/1426

For now, it should be possible to set decodeEntities to false and deal with entities afterwards.

html-to-text before version 9.0.0 should also be unaffected.

KillyMXI commented 1 year ago

Upstream issue is fixed in htmlparser2 8.0.2. And I published html-to-text 9.0.5 with updated dependencies.