taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.12k stars 112 forks source link

It doesn't parse all html documents correctly!!! #223

Open jpolstre opened 2 years ago

jpolstre commented 2 years ago

I spent hours reviewing my code, when I discovered that the error was in your package, here is the code that fails:

strHtml: https://drive.google.com/file/d/1DkO-PzbkTcmTlDgUrYxXsBMNYwxtNUUB/view?usp=sharing

when I do:

 let root =  parse(strHtml);
 let elem = root.querySelector('.contenido');
 console.log(elem); //undefined

I've been using your package for a long time, it seemed to work fine, I ran into this error with this particular document. With other analyzers I had no problems.

node-html-parser version: 6.1.1

taoqf commented 2 years ago

I am so sorry and thanks very much. Sadly, I am not access to google. Is https://github.com/taoqf/node-html-parser/issues/224 related to this issue?

milahu commented 1 year ago

https://drive.google.com/file/d/1DkO-PzbkTcmTlDgUrYxXsBMNYwxtNUUB/view

ex.html.zip

taoqf commented 1 year ago

I'm afraid this html file contains some broken html tag like <form>, <li> would cause this issue.