AD scraper not always returning full HTML

uvacw / inca

24 stars 6 forks source link

AD scraper not always returning full HTML #500

Closed FeLoe closed 5 years ago

FeLoe commented 5 years ago

The AD scraper sometimes (about every 4-5 articles) does only return the first paragraph of the page instead of the whole text. Most likely related to some form of paywall/cookie wall.

FeLoe commented 5 years ago

Important update: All of the AD scrapers now run into the privacy wall, so we do not have any content from the articles at all (apart from the RSS content). Do we have a way to get around the privacy wall when scraping?

FeLoe commented 5 years ago

At least the cookiewalls are now disabled since the last pull request, still have the paywalls but that could only be solved with login credentials.