taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.11k stars 107 forks source link

Preserve invalid nested A tags in AST #215

Closed nonara closed 2 years ago

nonara commented 2 years ago

Background

A tags which were nested are invalid HTML.

Example

<a href="#"><b>link <a href="#">nested link</a> end</b></a>

Browsers and some parsers (like parse5) handle this by correcting the HTML as follows:

<a href="#"><b>link </b></a><a href="#">nested link</a> end

This library did not fix these tags, until PR #148

New Behaviour

We have determined that the behaviour of fixing the nested A tag is not appropriate for this sort of parser. Parsers which apply fixes produce full compliant HTML, ready for browser consumption. That is not the purpose of this library.

⚠️ BREAKING CHANGE ⚠️

Because this can be considered a breaking change, we are incrementing the major version #

To support legacy applications which rely on this behaviour being done quickly, during parse, we've silently introduced a new option, which is disabled by default:

fixNestedATags