niespodd / browser-fingerprinting

Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?
https://niespodd.github.io/browser-fingerprinting/
4.07k stars 226 forks source link

thanks for more detailed info #6

Closed icesmartjuan closed 1 year ago

icesmartjuan commented 2 years ago

Hi @niespodd ,

Great summary on browser-fp! would you please share more details on the anti-anti-bot solution Specialized bot software that targets the unique detection surface of the target website ?

image

looking forward to your comments, thank you!

niespodd commented 2 years ago

Hi @icesmartjuan,

Let me use two examples.

Let's assume that the site you want to scrape uses Distil Networks protection. By doing finite work on reverse engineering the fingerprinting script you can determine the detection surface quite precisely. All because it's written (mostly) in JavaScript. An example of such a solution can be found here.

Many websites use Recaptcha/hCaptcha to limit bot traffic. For example, Alibaba asks for a solution when logging into an account. Until recently, after passing the captcha gate, the logged-in user session was not limited by the number of requests on their website. It was therefore sufficient to "manually" generate a pool of session tokens (create and sign in with an account) and then (without changing the IP) use them to scrape at a reasonable rate-limit.

Generally, such a strategy makes sense when you are targeting a limited number of sites and have the resources to reverse engineer detection scripts. However, you have to reckon with the fact that every time you change a detection script, your solution will be prone to detection.

icesmartjuan commented 1 year ago

I see, much appreciated @niespodd