taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.12k stars 112 forks source link

Losing content when using lowerCaseTagName option as true #30

Closed fellypsantos closed 4 years ago

fellypsantos commented 4 years ago

I'm using axios to fetch some informations from a website, and the HTML code came all with uppercase tags like , <BODY> and so on. I'm getting a bunch of problems to get elements with querySelector, so I tryied to configure the parsing option , setting lowerCaseTagName to true.</p> <p>But for some reason it removes a lot of the code like, script tags, body tags, also removes the head tag, but preserve it content.</p> <p>I solved the problem converting the axios response to lowercase, and then passing it to parse() function, this way, querySelector worked great.</p> <p>The problematic HTML code is right down, if it helps. <a rel="noreferrer nofollow" target="_blank" href="https://pastebin.com/raw/H6Vzwpe9">https://pastebin.com/raw/H6Vzwpe9</a></p> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/taoqf"><img src="https://avatars.githubusercontent.com/u/15901911?v=4" />taoqf</a> commented <strong> 4 years ago</strong> </div> <div class="markdown-body"> <p>Thanks for your report.</p> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>