rubycdp / ferrum

Headless Chrome Ruby API
https://ferrum.rubycdp.com
MIT License
1.76k stars 127 forks source link

Implement stealth mode #142

Open route opened 3 years ago

route commented 3 years ago

https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth

balt5r commented 3 years ago

Would be great if it could pass those tests

alexanderadam commented 3 years ago

undetected_chromedriver might also be a good reference.

Also it would probably make sense to add the intoli's checks to the specs. They are also on GitHub (here and here).

brettallred commented 3 years ago

@route Any thoughts on adding this in? We've been using ferrum for a while now and started getting blocked on one of the sites.

I'm happy to take a cut at implementing this if you want to outline some of your thoughts on how you envision doing it. I studied the source code for about an hour tonight just thinking through some options here.

alexanderadam commented 3 years ago

Hi @brettallred,

I'm happy to take a cut at implementing

This would be so wonderful! :pray:

I'm not a maintainer here but I would like to see Stealth mode as an integrated extension.

My idea would be:

Specs

Implementation of the extension itself

there are good references out there:

Outside of the specs, you could also check the reCAPTCHA score how good the scripts work.

Summary of a possible solution — TL;DR;

  1. Create a HTML file in spec/support/views containing the checks mentioned above to have a reliable check available within the specs — maybe also a simple HTML table with a summary (i.e. you are [not] a bot)
  2. Write the spec in the way that it intentionally fails (since the extension is not used / ready yet — so that it's obvious that the specs work — i.e. expect(browser.body).to include("you are not a bot"))
  3. Write a rake task (i.e. rake update:stealth_extension) to fetch/build the minimized/compiled puppeteer-extra-plugin-stealth extension and put it in a nice extensions directory within the ferrum repository
  4. Hopefully the spec will be green now if the extension was properly loaded (remember to add Ferrum::Browser.new(extensions: %w(path/to/stealth/ext.js)) or even a shortcut like stealth_mode: true to that) :wink:
  5. optional: document how to integrate Privacy Pass

Again, this is just an idea and I'm not the maintainer here. So please take it with a grain of salt. But I think this could work in a very maintainable manner.

PS: Updating the stealth extension could even be a GitHub action later on.

ttilberg commented 3 years ago

I just wanted to pass a small note that the move @alexanderadam proposed is absolutely feasible. Absurdly so. I've always been a bit intimidated wrangling the js/extension side of things so I kind of brushed that last comment off a bit, assuming additional wiring would need to happen. Tonight I stumbled back into it and noted in particular extract-stealth-evasions, and thought I'd just see where I could get with it. Woah.

image image

First off, thank you @alexanderadam for your detailed note. I saw it this spring, but like I said... I didn't understand it's proposed simplicity. Second, I wanted to report these findings just in case it inspires someone else.

sebthemonster commented 2 years ago

According to these webpages :

Tests of bot.sannysoft.com and www.nowsecure.nl are successfully passed with this configuration of browser :

browser = Ferrum::Browser.new(browser_path: BROWSER_PATH, headless: false, browser_options: { "disable-blink-features": "AutomationControlled" })

I don't yet find how to pass them in headless mode.

sandstrom commented 1 year ago

Isn't this a problem better solved at the Chromium level?

I read this article recently, seems like there are improvements in an upcoming version of Chrome:

https://antoinevastel.com/bot%20detection/2023/02/19/new-headless-chrome.html

I'd close this issue, out of scope for Ferrum.

route commented 1 year ago

It is, but still ferrum itself can provide some guidance and scripts to make it even harder from the beginning to detect automation.

wflanagan commented 1 year ago

Is there documentation on how to get the new headless mode in Ferrum?

akavitaliy commented 1 year ago

You've found a solution on how to transfer them in headless mode?

maeve commented 1 year ago

You can enable the new headless mode in chromium by modifying the browser options:

Ferrum::Browser.new(browser_options: { "headless": "new" })
route commented 1 year ago

You can enable the new headless mode in chromium by modifying the browser options:

Ferrum::Browser.new(browser_options: { "headless": "new" })

it doesn't work, because there's a lot more work to be done https://github.com/rubycdp/ferrum/pull/379