taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.11k stars 107 forks source link

How can I parse a document with errors in the closing of tags? #225

Open jpolstre opened 1 year ago

jpolstre commented 1 year ago

Example:

var doc = nodeParse('
<html>
   <body>
     <table>
        <tbody>
            <tr>
              <td>
                 <a href="#" class="anchor" >link</a>
             <td>
          <tr>
          <tbody>// error close tag
       </table>
    </body>
 </html>')

var anchor = doc.querySelector('.anchor')
console.log(anchor.parentNode.parentNode.parentNode) //Returns <html..., when <tbody.. is expected.

In other languages ​​and with other packages I have no problem. I also don't want to be putting: voidTag:{ tags: ['area', 'base',...], in the configuration, since I don't know in which labels the error will appear. Is there a way to do what I'm looking for? Thank you for your package.

taoqf commented 1 year ago

https://github.com/taoqf/node-html-parser/issues/152 https://github.com/taoqf/node-html-parser/issues/231 too many issues about broken html, I really have no time on this. pr is welcomed.