taoqf / node-html-parser

A very fast HTML parser, generating a simplified DOM, with basic element query support.
MIT License
1.11k stars 107 forks source link

Using `innerText` does not escape all HTML codes properly? #183

Closed AgainPsychoX closed 2 years ago

AgainPsychoX commented 2 years ago

While scrapping OP.GG, some text includes apostrophes (').

In my example, I get Vel'Koz instead of Vel'Koz no matter which method I use - innerHTML (shouldn't work, so it's okay), rawText (same), innerText (this definitely should work!).

Quick peek as image what is going on: image

Meanwhile in browser, it works as expected, or even better (as both innerHTML and innerText work): image

In my case, workaround is really easy: I just replace ' with '.

nonara commented 2 years ago

Thanks for the report! Please try with the text or textContent properties. Those decode HTML entities