tautologistics / node-htmlparser

Forgiving HTML/XML/RSS Parser in JS for *both* Node and Browsers
MIT License
1.15k stars 139 forks source link

Fixed parsing of an HTML tag as the first thing inside a <script>. #56

Closed papandreou closed 10 years ago

papandreou commented 12 years ago

Hi!

When a <script> contains something that looks like markup as the first token, the < is not included in element.raw and element.data. This causes problems when parsing templates that use the type='text/html' hack, for example:

<html>
<body>
    <script type='text/html'><div></div></script>
</body>
</html>

... which makes the Text element come out as div></div>.

The above is also included as a test case.

I ran into this issue with jsdom, which still uses htmlparser 1.x.

papandreou commented 11 years ago

@tautologistics: Any chance of getting this merged?

papandreou commented 10 years ago

Never mind, all the software I care about is using other parsers by now.