seleniumbase / SeleniumBase

📊 Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.
https://seleniumbase.io
MIT License
4.95k stars 938 forks source link

Struggling with Remote Debugging port #2225

Closed Dylgod closed 10 months ago

Dylgod commented 10 months ago

I am trying to run a script on a previously opened browser. I cannot connect to the browser instead my script opens up a new browser with each new run. The Browser opened -> https://imgur.com/a/afBEAJX

I use this to spawn my browser: cd C:\Program Files\Google\Chrome\Application chrome.exe --remote-debugging-port=0249 --user-data-dir="C:\browser"

My current code:

#simple.py
from seleniumbase import Driver
from seleniumbase.undetected import Chrome
from selenium.webdriver.chrome.options import Options
from seleniumbase.undetected import ChromeOptions

chrome_options = Options()
chrome_options.add_experimental_option("debuggerAddress" , "localhost:0249")

web = Driver(browser='chrome',chromium_arg=chrome_options,remote_debug=True)

body = web.find_element("/html/body")
print(body.text)

Things i have tried: -not changing the port (9222) -pytest simple.py --remote-debug -uc=True with uc.chromeoptions -chrome_options = Options() -chrome_options = ChromeOptions() -browser="remote" -removing chromium_arg and adding port="9222" -chromium_arg="remote-debugging-port=9222" -example solution with SB in #2049

I am converting my large Selenium project into SeleniumBase and being able to test in this way would be invaluable

mdmintz commented 10 months ago

Hello, I answered that one in https://www.youtube.com/watch?v=5dMFI3e85ig at 20:44 and again at 21:46. You're trying to mix a user-data-dir from non-UC-Mode Chrome with UC Mode Chrome. That's what the video referred to as "Crossing the Streams".

There are 2 different options for things you can do:

  1. Reuse the browser session between tests (uses an already-opened browser / same user-data-dir)
  2. Set --user-data-dir=DIR / user_data_dir=DIR (For a dir that was created by UC Mode)

1: Use the SeleniumBase pytest command-line option: --rs / --reuse-session to reuse the browser session between tests, which works by keeping the already-opened web browser open for more tests (which also reuses the user-data-dir). For that, you need to use one of the SeleniumBase syntax formats that run with pytest. Then, add the command-line options you need, such as --rs and --uc:

pytest --uc --rs -s -v -x (To run the following script:)

from seleniumbase import BaseCase
BaseCase.main(__name__, __file__, "--uc", "--rs", "-s", "-v", "-x")

class MyTests(BaseCase):
    def test_1_(self):
        if not (self.undetectable):
            self.get_new_driver(undetectable=True)
        self.driver.get("https://nowsecure.nl/#relax")
        try:
            self.assert_text("OH YEAH, you passed!", "h1", timeout=5)
        except Exception:
            self.clear_all_cookies()
            self.get_new_driver(undetectable=True)
            self.driver.get("https://nowsecure.nl/#relax")
            self.assert_text("OH YEAH, you passed!", "h1", timeout=5)
        self.post_message("Selenium wasn't detected!", duration=3)

    def test_2_(self):
        self.open("seleniumbase.io/demo_page")
        self.sleep(2)

    def test_3_(self):
        self.open("https://nowsecure.nl/#relax")
        self.assert_text("OH YEAH, you passed!", "h1", timeout=3)
        self.post_message("Selenium STILL wasn't detected!", duration=3)

Then you'll get the following output: (File was named test_uc_with_rs.py)

test_uc_with_rs.py::MyTests::test_1_ PASSED
test_uc_with_rs.py::MyTests::test_2_ PASSED
test_uc_with_rs.py::MyTests::test_3_ PASSED

Note that on test_3, you didn't see the captcha screen because it was already bypassed from test_1.


2: Set --user-data-dir=DIR / user_data_dir=DIR (For a dir that was created by UC Mode)

As long as that dir was created by UC Mode Chrome (and never used with non-UC-Mode Chrome), then your scripts will function correctly in UC Mode (don't forget the --uc / uc=True for that):

from seleniumbase import Driver

driver = Driver(uc=True, user_data_dir=DIR)
driver.get("https://nowsecure.nl/#relax")
driver.sleep(4)
driver.quit()
Dylgod commented 10 months ago

If im using uc mode, would that mean i would be unable to import cookies from my normal chrome browser if i am running tests in this way?

mdmintz commented 10 months ago

@Dylgod The cookies part can be done separately, but no guarantees that you won't get detected if importing cookies from a non-UC-Mode web browser.

Here are the methods for it (for a regular SeleniumBase syntax format):

self.save_cookies(name="cookies.txt")

self.load_cookies(name="cookies.txt")

If using the raw driver format (which you appear to be), you'll have to use regular selenium methods for it instead. Eg.

cookies = driver.get_cookies()
json_cookies = json.dumps(cookies)

# Then

cookies = json.loads(json_cookies)
for cookie in cookies:
    if "expiry" in cookie:
        del cookie["expiry"]
    driver.add_cookie(cookie)

(May require some customization. And your cookies may still need the "expiry" part.)

Dylgod commented 10 months ago

I made a short demo and recorded it (<2m) to show what I'm running into and hopefully it will make it obvious where I'm going wrong. Everything else has worked flawlessly so I assume it's human error on my end, but this is the last issue holding back the great migration from UC to SB and I just can't seem to figure it out.

https://www.youtube.com/watch?v=sZr3DO2g-QE

I apologize if I'm missing something obvious

mdmintz commented 10 months ago

@Dylgod I watched the video. There are a few things I noticed that were off.

  1. If you ever see Chrome is being controlled by automated test software., it means that either SeleniumBase wasn't used at all, or an error occurred while trying to launch the browser with SeleniumBase settings (browsers are launched from SeleniumBase/seleniumbase/core/browser_launcher.py), so it did the failsafe, which is launching a Chrome browser with no additional options at all.

  2. It looked like you ran something with pytest that was in the Driver() format. Only some syntax formats run with pytest. Others are run with python, and there's a totally separate behave format that's a whole other topic.

  3. You had from seleniumbase.undetected ... in your script. Only SeleniumBase/seleniumbase/core/browser_launcher.py should ever call it directly (not your scripts). The seleniumbase.undetected files are no longer standalone-callable, as they were in undetected-chromedriver. There was overlap of required settings/options when undetected-chromedriver was forked into SeleniumBase, so the parts that were already included with browser_launcher were removed from seleniumbase.undetected. Tests should be structured via one of the 23 syntax formats. Each one has varying ways of spinning up a web browser, and how options are handled. (Eg. If running tests via pytest, then pytest --uc activates UC Mode. If using the Driver() format, then you would use Driver(uc=True), for example.)

  4. It looked like you were trying to use a regular Chrome user-data-dir as a UC Mode user-data-dir, which would be of a slightly different, yet incompatible format.

  5. The --remote-debug option tells selenium to use port 9222 by default. But looks like you already had a non-selenium Chrome using that port. Might cause issues. Selenium will handle the multithreaded Chromes correctly if you're using pytest multithreading. Lots of examples of that in SeleniumBase.

So first, to get to the root of the issues, clear up any abandoned Chrome/chromedriver processes so that port 9222 is definitely free. Then try a simple SeleniumBase script (of any kind). Make sure you don't see the Chrome is being controlled by automated test software. message. If you still see it, something's wrong. Then expand that to using a simple UC Mode script (driver = Driver(uc=True) for example). Make sure you still don't see the Chrome is being controlled by automated test software. And then again with remote_debug=True. Step by step we'll see what's working and where your root issue may be.

If what you're trying to do is connect to an already-running Chrome instance, note that this is what UC Mode already does: It spins up a regular Chrome browser, and then attaches Selenium to it. That part is already done for you. If you try doing that yourself, that could end up being an issue because then you would have UC Mode try to connect to both the browser you already spun up, and the special one it already spins up itself to connect to.

mdmintz commented 10 months ago

If you run into any issues with headless UC Mode, it was just fixed in 4.20.9 - https://github.com/seleniumbase/SeleniumBase/releases/tag/v4.20.9

Dylgod commented 10 months ago

Thank you for responding so quick and with so much detail, I am a SeleniumBase noob and your responses are very helpful and have taught me a lot. I spun up a browser with this code making sure that the 9222 port was open. current ver. 4.20.9

from seleniumbase import Driver

web = Driver(uc=True)
web.get("https://seleniumbase.io/demo_page")

uc mode works perfectly fine here and the browser remained open and did not show any signs of using any failsafes. Next I ran a couple different tests to try to connect to this uc browser and then the same process with a normal browser. Every test was ran after clearing all chromedriver and uc_driver proccesses

#Test 1
from seleniumbase import Driver

web = Driver(uc=True,remote_debug=True)
web.get("https://google.com")
#Test 2
from seleniumbase import Driver

web = Driver(remote_debug=True)
web.get("https://google.com")

I ran these tests and then added port="9222" to both after for a total of 4 seperate tests. All of these resulted in spawning a new browser and ignoring the already open one. I then found Syntax #9 and added

options.add_experimental_option("debuggerAddress" , "localhost:0249")

and removed

options.add_experimental_option(
    "excludeSwitches", ["enable-automation", "enable-logging"],
)
prefs = {
    "credentials_enable_service": False,
    "profile.password_manager_enabled": False,
}
options.add_experimental_option("prefs", prefs)

The debuggerAddress clashes with the other options and throws a InvalidArgumentException but once the other lines are removed it works and I can freely call different scripts on the same opened browser using this format. Operating in this way makes the browser detectable but that does not have any effect on my workflow. I'm not sure what makes Syntax #9 work whereas the other things (above, among many other ideas today) I've tried aren't. It's probably user error, but I'm at a total loss.

mdmintz commented 10 months ago

Hello, so once you run the code below, it'll open the web page and then activate a breakpoint. (c + Return to continue)

from seleniumbase import Driver

driver = Driver(uc=True, remote_debug=True)
try:
    driver.get("https://google.com")
    import pdb; pdb.set_trace()
finally:
    driver.quit()

While in the breakpoint, you should be able to navigate to chrome://inspect/#devices and see the Google page show up there. That shows it works. (Port 9222)

Screenshot 2023-11-02 at 4 31 09 PM

And then after inspecting the page:

Screenshot 2023-11-02 at 4 27 43 PM

If you're not using UC Mode, you should be able to connect to an already-opened web browser. You can use Syntax Format 9 for complete customization, as you have already done.

But with UC Mode, it needs to spin up the browser that it attaches to in order for it to work. UC Mode modifies chromedriver, which in turn modifies how Chrome works. The new settings aren't compatible with the old, hence the reason why passing a user-data-dir from regular Chrome will get you detected with UC Chrome, etc. Remote debugging is also affected by this.

But the good news is that UC Mode already has what you need to avoid detection with default settings. It handles the tricky things for you such as remote-debug handling, user-data-dir handling, etc. But perhaps you'll get to where you want to go with the Syntax Format 9 customizations.