Open NikolaiT opened 3 years ago
Hi @NikolaiT
The list of fingerprinting/detection surfaces that I have covered so far displays barely tip of the iceberg. In the upcoming weeks I will make some more updates. Stay tuned 😎
Generally the all bot detection technologies work in three "dimensions" and aim to find irregularities:
At the first sight it may sound overhelming, but you need to keep in mind that no anti-bot system should block access for regular users.
To put it differently, if the anti-bot system is not 100% sure you are a bot, you are very likely not one, and you will pass the test. The system may generate you a score and based on that apply some evasion techniques e.g. slow down your requests, display "shadowed" data, send a captcha gateway. At this point your job is to polish your scraper, proxy until it perfectly resembles a real browser.
Now, to your question:
What kind of scraping setup do you suggest?
I suggest addressing all three points mentioned above:
plugins
evasion). With this approach however, because it's public code and anti-bots are quickly following up, it may work only 60% of the time... 🤣
I am currently going with something like this, what do you think?
Good idea with using original Chrome. I can't say more than that, because I am not sure if https://google.com
is what you intend to scrape. If that's the case you'll need some more stealth-iness 😊
Just wanted to drop by and say thanks. It's good to be aware of those techniques. It's insanely complex to not get detected.
What kind of scraping setup do you suggest?
I am currently going with something like this, what do you think?