seleniumbase / SeleniumBase

📊 Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.
https://seleniumbase.io
MIT License
5.17k stars 960 forks source link

UC Mode detected by fnac.com #1760

Closed donggoing closed 1 year ago

donggoing commented 1 year ago

It seems to be detected by geetest with undetected_chromedriver. Example:

from seleniumbase import SB
with SB(undetectable=True, uc_cdp_events=True) as sb:
    sb.open("https://www.fnac.com/mp46660382")
    input('Press Enter to continue...')

You can open the page for the first time. But when you refresh it, it will come to the captcha page, and even with manual solving, it will be detected and shows you have been blocked.

Env: SeleniumBase 4.13.0 python 3.8.16

mdmintz commented 1 year ago

For starters, that website through a captcha even when not using Selenium, and also blocked for not using Selenium:

Why this blocking? Something about the behaviour of the browser has caught our attention.

There are various possible explanations for this:
 * you are browsing and clicking at a speed much faster than expected of a human being
 * something is preventing Javascript from working on your computer
 * there is a robot on the same network as you

Based on this, it has nothing to do with UC Mode, but other factors if blocked when using a regular browser.

Already discussed here: https://github.com/ultrafunkamsterdam/undetected-chromedriver/issues/912

donggoing commented 1 year ago

The situation I encountered was that when using a regular browser to visit and refresh the page, it would display normally. But when using Selenium to visit the page, it would appear as I described: after refreshing, it would jump to the CAPTCHA page and then whether manual or automated verification, it would be detected as a bot, and the shows you have been blocked.

mdmintz commented 1 year ago

That website saves IP Addresses, so if it already detected you once, it'll detect you even if not using Selenium.

donggoing commented 1 year ago

But what i described happened with the same ip. When not using selenium, i can visit, otherwise whether manual or automated verification, it would be detected as a bot, and the shows i have been blocked(after refreshing or jump to other page of fnac.com. important! ).

mdmintz commented 1 year ago

Even with regular undetected-chromedriver, one cannot simply refresh the page. Detection could get triggered unless performing other actions, such as seen here: https://github.com/ultrafunkamsterdam/undetected-chromedriver/blob/bf7dcf8b5713020de7454844fb80036b8c456503/undetected_chromedriver/cdp.py#L90 (That shows the webdriver get() method being overridden with new functionality.)

donggoing commented 1 year ago

Here is the normal chrome:

https://user-images.githubusercontent.com/29890210/219995122-eb227ea7-60fb-43b5-a929-123c4fc08cfb.mp4

And here is sbase with uc: (In this case, i don not refresh)

https://user-images.githubusercontent.com/29890210/219995130-e4a27008-3bfe-415c-90ff-17ae80642936.mp4

And here is with a new ip, i dont refresh but try to open another page:

https://user-images.githubusercontent.com/29890210/219995663-8fdc4c95-e652-4b47-afa8-ae82aa580336.mp4

mdmintz commented 1 year ago

If you need to refresh the page, open a new driver in UC Mode. That website looks like it knows how to detect Selenium from a page refresh. Maybe https://github.com/ultrafunkamsterdam/undetected-chromedriver has other ideas, but there is little more I can do to improve the anti-detection abilities at this point.

donggoing commented 1 year ago

All right, thanks a lot for your real-time reply.

z724133545 commented 1 year ago

你好 可以加个好友一起研究吗