Open malparty opened 3 weeks ago
Thank you for your feedback.
Aside from the user-agent rotation and random delay, I also tried scraping data from google cache by prepending the url with "http://webcache.googleusercontent.com/search?q=cache:" but it doesn't seem to work.
Other technics I would try might be from this article:
Thanks for your reply.
According to you, what are the pros and cons for 1 and 2? (they both have different advantages and inconveniences, so knowing them would help to choose which approach to use).
Thank you for the question.
If I had to choose one of them, I would try implementing headless browser scraping first since it doesn't require setting up services outside of the application and doesn't incur any costs.
References:
Issue
I went through the commit that implemented the User Agent rotation. The idea is interesting and is often the first one tried (with more or less success).
If you had more time for this challenge, what other technics would you explore and try?