taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.12k stars 112 forks source link

Regression: Inputs with long values not parsed #158

Closed david-golightly-leapyear closed 3 years ago

david-golightly-leapyear commented 3 years ago

This worked in 4.1.4, and is broken in 4.1.5.

Given the following code:

const htmlText = `<html>
  <body>
    <form>
      <input name="SAMLResponse" value="&#x3D;">
    </form>
  </body>
</html>`

const input = parse(htmlText).querySelector('input[name="SAMLResponse"]')

Expected: input should contain a reference to the parsed input element

Actual: input is null. The form element has a single TextNode child containing the raw, unparsed HTML text where the input element should be.

nonara commented 3 years ago

Unfortunately, I can't reproduce this. I pasted the exact code, but it seems to be working here.

image

Here is my exact test-case:

const { parse } = require('node-html-parser');

const htmlText = `<html>
  <body>
    <form>
      <input name="SAMLResponse" value="&#x3D;">
    </form>
  </body>
</html>`

const input = parse(htmlText).querySelector('input[name="SAMLResponse"]')
console.log(input);

Try removing node_modules, clearing yarn cache, and reinstalling. I would also recommend upgrading to v5, as the matching algorithms are a bit better.

If the error persists, please setup a small reproduction repository, and I'll have a look.

david-golightly-leapyear commented 3 years ago

Hm, no, it's not node_modules etc. I narrowed it down to a minimal diff:

const htmlText = `<html>
  <body>
    <form>
      <input name="SAMLResponse"

 value="&#x3D;">
    </form>
  </body>
</html>`

const input = parse(htmlText).querySelector('input[name="SAMLResponse"]')

^ this fails. The only difference seems to be the newline after the name="SAMLResponse" attribute. When that newline is converted into a space, it parses successfully.

david-golightly-leapyear commented 3 years ago

However, this appears to be fixed in 5.0.0, so I'm fine closing it. Thanks!