taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.12k stars 112 forks source link

Wrong nesting when parsing `a` inside another `a` #172

Closed rndm2 closed 3 years ago

rndm2 commented 3 years ago

Input:

<!DOCTYPE html>
<html>
<body>
  <div id="1">

  <a id="s1">
    <a id="s2">2</a>
    <div id="ss1">2</div>
  </a>

  </div>
  <div id="2">
  </div>
</body>
</html>

Output:

<!DOCTYPE html>
<html>
<body>
  <div id="1">

  <a id="s1">
    </a>
    <a id="s2">2</a>
    <div id="ss1">2</div>

  </div>
  <div id="2">

  </div>
</body>
</html>

I understand that a element inside another a is not allowed by HTML5 standard. But I believe parser should not make markup, it should parse it.

nonara commented 3 years ago

Thanks for the report!

We are actually handling nested A tags according to parser spec. To confirm, you can check this on www.astexplorer.net

Another way is to open chrome inspector and create a nested A tag in the source. When you click out of the edit box and inspect the DOM, you'll see that it's handled the same way.

Hope that helps!