Closed mweinelt closed 6 years ago
The content scraper doesn't evaluate Javascript and the tags <script/>
and <noscript/>
are stripped by the content sanitizer for security reasons.
I guess this website has been designed to mess with crawlers. That also means that it doesn't work without Javascript.
The Kinja family of blogs (Lifehacker, Gizmodo, Jalopnik, etc.) also cause problems for the scraper because they use JavaScript to render their content.
The scraper could be replaced with one that simulates the browser more closely, but that would obviously be quite a bit of added complexity, so it may not be worth it.
Wowhead.com hides their content behind a javascript print function, is there any way to write a scraper rule for that? Does goquery evaluate javascript? After execution the content lands in the empty div.news-post-5b81a133ac83a.
https://www.wowhead.com/news=286563/battle-for-azeroth-darkmoon-deck-community-opinions