seleniumbase / SeleniumBase

📊 Python's all-in-one framework for web crawling, scraping, testing, and reporting. Supports pytest. UC Mode provides stealth. Includes many tools.
https://seleniumbase.io
MIT License
5.17k stars 960 forks source link

Documenting the `selenium-wire` integration with the `Driver()` manager #2145

Open mdmintz opened 1 year ago

mdmintz commented 1 year ago

Documenting the selenium-wire integration with the Driver() manager

Here's an example of the selenium-wire integration with the Driver manager:

from seleniumbase import Driver

driver = Driver(wire=True, headless=True)
try:
    driver.get("https://wikipedia.org")
    for request in driver.requests:
        print(request.url)
finally:
    driver.quit()

Here's the output of that:

https://accounts.google.com/ListAccounts?gpsia=1&source=ChromiumBrowser&json=standard
https://wikipedia.org/
https://www.wikipedia.org/
https://www.wikipedia.org/portal/wikipedia.org/assets/js/index-24c3e2ca18.js
https://www.wikipedia.org/portal/wikipedia.org/assets/img/Wikipedia-logo-v2@2x.png
https://www.wikipedia.org/portal/wikipedia.org/assets/js/gt-ie9-ce3fe8e88d.js
https://www.wikipedia.org/portal/wikipedia.org/assets/img/sprite-de847d1a.svg
https://www.wikipedia.org/portal/wikipedia.org/assets/img/Wikinews-logo_sister@2x.png

The wire integration can also be activated via command-line option: --wire


Here's a more advanced example:

from seleniumbase import Driver

def intercept_response(request, response):
    print(request.headers)

driver = Driver(wire=True)
try:
    driver.response_interceptor = intercept_response
    driver.get("https://wikipedia.org")
finally:
    driver.quit()

Here's some output from running that:

sec-ch-ua: "Chromium";v="122", "Not(A:Brand";v="24", "Google Chrome";v="122"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "macOS"
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
sec-fetch-site: none
sec-fetch-mode: navigate
sec-fetch-user: ?1
sec-fetch-dest: document
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9

sec-ch-ua: "Chromium";v="122", "Not(A:Brand";v="24", "Google Chrome";v="122"
sec-ch-ua-mobile: ?0
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36
sec-ch-ua-platform: "macOS"
accept: */*
sec-fetch-site: same-origin
sec-fetch-mode: no-cors
sec-fetch-dest: script
referer: https://www.wikipedia.org/
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
cookie: WMF-Last-Access-Global=22-Feb-2024;

And note, (since it gets asked a lot), wire mode is not compatible with uc mode!

mdmintz commented 1 year ago

I added documentation on the selenium-wire integration here: https://github.com/seleniumbase/SeleniumBase/commit/eea244127cf26daac6ec28a3139382378071e0d5 (in SeleniumBase/help_docs/syntax_formats.md)

mdmintz commented 1 year ago

Also for wire mode, here's a method that can be added to change the proxy settings in the middle of a script:

def set_wire_proxy(string):
    """Examples:
    set_wire_proxy("SERVER:PORT")
    set_wire_proxy("socks5://SERVER:PORT")
    set_wire_proxy("USERNAME:PASSWORD@SERVER:PORT")
    """
    the_http = "http"
    the_https = "https"
    if string.startswith("socks4://"):
        the_http = "socks4"
        the_https = "socks4"
    elif string.startswith("socks5://"):
        the_http = "socks5"
        the_https = "socks5"
    string = string.split("//")[-1]
    driver.proxy = {
        "http": "%s://%s" % (the_http, string),
        "https": "%s://%s" % (the_https, string),
        "no_proxy": "localhost,127.0.0.1",
    }
mdmintz commented 1 year ago

The set_wire_proxy(string) method was added directly into the driver - (--wire mode only):

driver.set_wire_proxy(string) (in seleniumbase 4.19.2)

steinerx commented 11 months ago

Do you have plans to make wire mode and uc mode compatible in the future?

mdmintz commented 11 months ago

There's --uc mode, and there's --wire mode, but the two can't be used together due to unresolved issues with selenium-wire: https://github.com/wkeeling/selenium-wire/search?q=undetected&type=issues

But the good news is that you don't need the wire integration anymore because SeleniumBase has --uc-cdp / uc_cdp=True mode, which collects the same data that --wire mode collected. (If you have any problems with that, use --log-cdp / log_cdp=True - https://github.com/seleniumbase/SeleniumBase/issues/2220)

Here's an example that uses it: https://github.com/seleniumbase/SeleniumBase/blob/master/examples/uc_cdp_events.py

guocity commented 5 months ago

Can I use uc_cdp=True with remote_debug = True? seems like if uc_cdp true, remote_debug is not working,

mdmintz commented 5 months ago

@guocity uc_cdp with remote_debug is working for me.

Screenshot 2024-05-02 at 11 49 32 PM
guocity commented 5 months ago

@guocity uc_cdp with remote_debug is working for me.

I mean work with existing remote debugging session with SB(remote_debug = True, headless=True) as browser: # it will work with existing remote debug session

with SB(headless=False, uc_cdp_events=True, remote_debug = True) as browser: # it won't work with existing remote debugging session

mdmintz commented 5 months ago

@guocity What exactly do you mean by "it won't work"? The screenshot I took in https://github.com/seleniumbase/SeleniumBase/issues/2145#issuecomment-2092074713 was with uc_cdp_events=True and remote_debug=True. I was able to inspect the "devices" (browser windows).

guocity commented 5 months ago

I have a chrome opened with --remote-debugging-port=9222. In SB remote_debug = True, it will just use the existing chrome remote debug session open earlier. However, uc_cdp_events = True, it won't

mdmintz commented 5 months ago

This is the main thing that uc_cdp_events=True does:

options.set_capability("goog:loggingPrefs", {"performance": "ALL", "browser": "ALL"})

If that's causing some kind of port conflict, then that's from Selenium internals / Chrome internals.

When you say, it won't "use the existing chrome remote debug session open earlier", what does it do instead? To help debug, I need to understand what you're seeing. There might not be much that I can do if the issue is on Selenium's end or Chrome's end. SeleniumBase only sets "goog:loggingPrefs" - it does not control how "goog:loggingPrefs" works.

Also, it doesn't make sense why you're trying to connect to an existing Chrome in UC Mode. UC Mode spins up a new Chrome browser and attaches Selenium to it. If you just use uc_cdp_events=True with a new UC Mode web browser with remote_debug=True, then things should work.

guocity commented 5 months ago

I see, I don't need undetected, I want to see network request, I was using add_cdp_listener, and it's part of undetected,

mdmintz commented 5 months ago

Then something like this might be all you need:

from rich.pretty import pprint
from seleniumbase import SB

with SB(log_cdp=True, remote_debug=True) as sb:
    url = "seleniumbase.io/demo_page"
    sb.open(url)
    sb.sleep(2)
    pprint(sb.driver.get_log("performance"))