taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.12k stars 112 forks source link

Escaped HTML parsed Unescaped by parse() #33

Closed GrmpPnda closed 4 years ago

GrmpPnda commented 4 years ago

When reading in content that includes escaped HTML sequences, these are interpreted by the parse() function as unescaped HTML and included in outputs as unescaped.

This causes issues when text is included on the page that should be unescaped and is interpreted by the browser as an HTML tag.

For example:

SOURCE:

<html>
<body>
<textarea id="source'>
&lt;p&gt;
This content should be enclosed within an escaped p tag&lt;br /&gt;
&lt;/p&gt;
</textarea>
</body>
</html>

PARSED INPUT:

<html>
<body>
<textarea id="source'>
<p>
This content should be enclosed within an escaped p tag<br />
<p>
</textarea>
</body>
</html>
taoqf commented 4 years ago

https://github.com/taoqf/node-html-parser/commit/b05eb5c127bd1887e9f5c668e9450fdc0efc4780 Try v1.2.11, good luck.

GrmpPnda commented 4 years ago

Super. Thank you for resolving this, and for creating and maintaining this project.