Open route opened 3 years ago
Would be great if it could pass those tests
undetected_chromedriver
might also be a good reference.
Also it would probably make sense to add the intoli's checks to the specs. They are also on GitHub (here and here).
@route Any thoughts on adding this in? We've been using ferrum for a while now and started getting blocked on one of the sites.
I'm happy to take a cut at implementing this if you want to outline some of your thoughts on how you envision doing it. I studied the source code for about an hour tonight just thinking through some options here.
Hi @brettallred,
I'm happy to take a cut at implementing
This would be so wonderful! :pray:
I'm not a maintainer here but I would like to see Stealth mode as an integrated extension.
My idea would be:
spec/extensions/stealth
)spec/support/views
for some examples) that shows various states (could be visually simpler than this, since we only would need to check the text output in the specs). There are nice reference pages out there with checks that could be integrated in this page:
there are good references out there:
these modules from puppeteer-extra-plugin-stealth
(IMHO the most complete implementation with a lot of details) — there's also a minified version of it available. So maybe we could have a rake
task that simply fetches that JS from CDN or uses extract-stealth-evasions
itself to make an own build. This way it would be very easy to update the script (also we could profit from patches on the other project). It seems that calling npx extract-stealth-evasions
should be enough on a machine that has node installed?
intoli did not only show how to detect but also how to circumvent these checks — check the sources mentioned above
Python's undetected-chromedriver
is simple (but by far not enough yet for many cases!)
especially for CloudFlare: they show less Captchas if the Privacy Pass extension is used (see this Cloudflare post for more information on that. Maybe it should be documented how to integrate and setup it easily? This could also be another blog post. Or maybe even integrated as another extension?
Outside of the specs, you could also check the reCAPTCHA score how good the scripts work.
spec/support/views
containing the checks mentioned above to have a reliable check available within the specs — maybe also a simple HTML table with a summary (i.e. you are [not] a bot
)expect(browser.body).to include("you are not a bot")
)rake update:stealth_extension
) to fetch/build the minimized/compiled puppeteer-extra-plugin-stealth
extension and put it in a nice extensions
directory within the ferrum repositoryFerrum::Browser.new(extensions: %w(path/to/stealth/ext.js)
) or even a shortcut like stealth_mode: true
to that) :wink: Again, this is just an idea and I'm not the maintainer here. So please take it with a grain of salt. But I think this could work in a very maintainable manner.
PS: Updating the stealth extension could even be a GitHub action later on.
I just wanted to pass a small note that the move @alexanderadam proposed is absolutely feasible. Absurdly so. I've always been a bit intimidated wrangling the js/extension side of things so I kind of brushed that last comment off a bit, assuming additional wiring would need to happen. Tonight I stumbled back into it and noted in particular extract-stealth-evasions
, and thought I'd just see where I could get with it. Woah.
First off, thank you @alexanderadam for your detailed note. I saw it this spring, but like I said... I didn't understand it's proposed simplicity. Second, I wanted to report these findings just in case it inspires someone else.
According to these webpages :
Tests of bot.sannysoft.com and www.nowsecure.nl are successfully passed with this configuration of browser :
browser = Ferrum::Browser.new(browser_path: BROWSER_PATH, headless: false, browser_options: { "disable-blink-features": "AutomationControlled" })
I don't yet find how to pass them in headless mode.
Isn't this a problem better solved at the Chromium level?
I read this article recently, seems like there are improvements in an upcoming version of Chrome:
https://antoinevastel.com/bot%20detection/2023/02/19/new-headless-chrome.html
I'd close this issue, out of scope for Ferrum.
It is, but still ferrum itself can provide some guidance and scripts to make it even harder from the beginning to detect automation.
Is there documentation on how to get the new headless mode in Ferrum?
You've found a solution on how to transfer them in headless mode?
You can enable the new headless mode in chromium by modifying the browser options:
Ferrum::Browser.new(browser_options: { "headless": "new" })
You can enable the new headless mode in chromium by modifying the browser options:
Ferrum::Browser.new(browser_options: { "headless": "new" })
it doesn't work, because there's a lot more work to be done https://github.com/rubycdp/ferrum/pull/379
https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth