wkeeling / selenium-wire

Extends Selenium's Python bindings to give you the ability to inspect requests made by the browser.
MIT License
1.9k stars 254 forks source link

How to fix error "Invalid server scheme: None" #309

Closed alex4200 closed 3 years ago

alex4200 commented 3 years ago

I am trying to use selenium-wire within a gitlab CI where I get the following error

Traceback (most recent call last):
  File "/gpfs/bbp.cscs.ch/ssd/gitlab_user_jobs/adietz/J25174/nse/check-pages/venv/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/addons/core.py", line 59, in configure
    server_spec.parse_with_mode(mode)
  File "/gpfs/bbp.cscs.ch/ssd/gitlab_user_jobs/adietz/J25174/nse/check-pages/venv/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/net/server_spec.py", line 80, in parse_with_mode
    return mode, parse(server_spec)
  File "/gpfs/bbp.cscs.ch/ssd/gitlab_user_jobs/adietz/J25174/nse/check-pages/venv/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/net/server_spec.py", line 47, in parse
    raise ValueError("Invalid server scheme: {}".format(scheme))
ValueError: Invalid server scheme: None
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "pagechecker.py", line 156, in <module>
    linkchecker()
  File "/gpfs/bbp.cscs.ch/ssd/gitlab_user_jobs/adietz/J25174/nse/check-pages/venv/lib/python3.8/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/gpfs/bbp.cscs.ch/ssd/gitlab_user_jobs/adietz/J25174/nse/check-pages/venv/lib/python3.8/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/gpfs/bbp.cscs.ch/ssd/gitlab_user_jobs/adietz/J25174/nse/check-pages/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/gpfs/bbp.cscs.ch/ssd/gitlab_user_jobs/adietz/J25174/nse/check-pages/venv/lib/python3.8/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "pagechecker.py", line 146, in linkchecker
    req = get_requests(url, interceptor)
  File "pagechecker.py", line 24, in get_requests
    driver = webdriver.Chrome(options=chrome_options)
  File "/gpfs/bbp.cscs.ch/ssd/gitlab_user_jobs/adietz/J25174/nse/check-pages/venv/lib/python3.8/site-packages/seleniumwire/webdriver.py", line 91, in __init__
    self.proxy = backend.create(
  File "/gpfs/bbp.cscs.ch/ssd/gitlab_user_jobs/adietz/J25174/nse/check-pages/venv/lib/python3.8/site-packages/seleniumwire/backend.py", line 35, in create
    proxy = MitmProxy(addr, port, options)
  File "/gpfs/bbp.cscs.ch/ssd/gitlab_user_jobs/adietz/J25174/nse/check-pages/venv/lib/python3.8/site-packages/seleniumwire/server.py", line 57, in __init__
    mitmproxy_opts.update(
  File "/gpfs/bbp.cscs.ch/ssd/gitlab_user_jobs/adietz/J25174/nse/check-pages/venv/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/optmanager.py", line 223, in update
    u = self.update_known(**kwargs)
  File "/gpfs/bbp.cscs.ch/ssd/gitlab_user_jobs/adietz/J25174/nse/check-pages/venv/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/optmanager.py", line 215, in update_known
    self.changed.send(self, updated=updated)
  File "/gpfs/bbp.cscs.ch/ssd/gitlab_user_jobs/adietz/J25174/nse/check-pages/venv/lib/python3.8/site-packages/blinker/base.py", line 266, in send
    return [(receiver, receiver(sender, **kwargs))
  File "/gpfs/bbp.cscs.ch/ssd/gitlab_user_jobs/adietz/J25174/nse/check-pages/venv/lib/python3.8/site-packages/blinker/base.py", line 266, in <listcomp>
    return [(receiver, receiver(sender, **kwargs))
  File "/gpfs/bbp.cscs.ch/ssd/gitlab_user_jobs/adietz/J25174/nse/check-pages/venv/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/addonmanager.py", line 119, in _configure_all
    self.trigger("configure", updated)
  File "/gpfs/bbp.cscs.ch/ssd/gitlab_user_jobs/adietz/J25174/nse/check-pages/venv/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/addonmanager.py", line 256, in trigger
    self.invoke_addon(i, name, *args, **kwargs)
  File "/gpfs/bbp.cscs.ch/ssd/gitlab_user_jobs/adietz/J25174/nse/check-pages/venv/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/addonmanager.py", line 237, in invoke_addon
    func(*args, **kwargs)
  File "/gpfs/bbp.cscs.ch/ssd/gitlab_user_jobs/adietz/J25174/nse/check-pages/venv/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/addons/core.py", line 61, in configure
    raise exceptions.OptionsError(str(e)) from e
seleniumwire.thirdparty.mitmproxy.exceptions.OptionsError: Invalid server scheme: None

What does it mean and how to avoid/fix this problem?

wkeeling commented 3 years ago

Can you share the code/config that produces the error? It looks as though the scheme (http/https) is missing somewhere - perhaps in the proxy config if you're using that.

alex4200 commented 3 years ago

I cannot post the entire code, but it is when I try to create the webdriver:

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options)

Not sure if the chromedriver is in the PATH. Maybe this is missing, but it generates a different error

alex4200 commented 3 years ago

I was able to reproduce the error with this complete code:

from seleniumwire import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options)

# Load the URL
driver.get("https://www.google.com")

python 3.8.3, running inside a gitlab CI

alex4200 commented 3 years ago

When I run the exact same code inside a python docker inside a gitlab CI, I get a different error:

Traceback (most recent call last):
  File "test_code.py", line 7, in <module>
    driver = webdriver.Chrome(options=chrome_options)
  File "/usr/local/lib/python3.8/site-packages/seleniumwire/webdriver.py", line 115, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
    self.service.start()
  File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 98, in start
    self.assert_process_still_running()
  File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/common/service.py", line 109, in assert_process_still_running
    raise WebDriverException(
selenium.common.exceptions.WebDriverException: Message: Service chromedriver unexpectedly exited. Status code was: 127
wkeeling commented 3 years ago

Do you have environment variables configured in GitLab CI for HTTP_PROXY or HTTPS_PROXY? If these are present then Selenium Wire will automatically pick them up - but will error if the values are incorrectly specified.

alex4200 commented 3 years ago

So it seems the problem was that google-chrome was not installed. So a bug could be still there, but a bug of the error message shown.

wkeeling commented 3 years ago

Ok thanks, so are things working now or are you still seeing an error?

alex4200 commented 3 years ago

Thanks, Things are working.Closing ticket

adamarla commented 3 years ago

Hi @wkeeling - we are experiencing this exact same issue "Invalid server scheme: None" when we try to use selenium-wire with the proxy_option as suggested in the manner suggested here. Wonder if this ticket can be re-opened or if we ought to create a new one.

Attempting to run selenium and headless-chromium in an AWS lambda running python 3.7. Options look something like this:

      chrome_options = webdriver.ChromeOptions()
      chrome_options.add_argument("--no-sandbox")
      chrome_options.add_argument("--headless")
      chrome_options.add_argument("--disable-extensions")
      chrome_options.add_argument("--single-process")
      chrome_options.add_argument("--disable-dev-shm-usage")
      chrome_options.add_argument("--disable-gpu")
      chrome_options.add_argument("--disable-software-rasterizer")
      chrome_options.add_argument("--window-size=1280x1696")
      chrome_options.add_argument("--user-data-dir=/tmp/user-data")
      chrome_options.add_argument("--hide-scrollbars")
      chrome_options.add_argument("--enable-logging")
      chrome_options.add_argument("--log-level=0")
      chrome_options.add_argument("--v=99")
      chrome_options.add_argument("--data-path=/tmp/data-path")
      chrome_options.add_argument("--ignore-certificate-errors")
      chrome_options.add_argument("--homedir=/tmp")
      chrome_options.add_argument("--disk-cache-dir=/tmp/cache-dir")
      chrome_options.add_argument(
          "user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
      )

and then

params = {
    "options": chrome_options,
    "seleniumwire_options": {
        "request_storage_base_dir": "/tmp",  # Use /tmp to store captured data
        "backend": "default",
        "proxy": {
            "http": f"http://scraperapi:{api_key}@proxy-server.scraperapi.com:8001",
            "no_proxy": "localhost,127.0.0.1",
        }
    },
}

and finally

driver = webdriver.Chrome(**params)
wkeeling commented 3 years ago

This issue may happen if the regex that parses the host from the proxy URL fails to match. I'm wondering whether your {api_key} contains a character which is triggering this to happen. Does the api key contain a colon : or some non-ascii character?

adamarla commented 3 years ago

Hi @wkeeling thanks for the quick response. api_key is pure alphanumeric [A-Za-z0-9]. But obviously, as is plain from the configuration above, there are other parts of seleniumwire_options.proxy.http that are not. Is the rest of it okay?

wkeeling commented 3 years ago

Thanks. The rest of the URL looks fine (the @ symbol is permitted).

Can you also confirm that you don't have environment variables HTTP_PROXY or HTTPS_PROXY set? If set, the value of these environment variables will override what you have in the code.

adamarla commented 3 years ago

No environment variables of that nature. Btw I should have shared the stack trace and not only the error message.

File "/var/task/crawler.py", line 107, in __init__
    self.driver = SeleniumProxyDriver()
  File "/var/task/crawler.py", line 26, in __init__
    self.driver = webdriver.Chrome(**params)
  File "/opt/python/seleniumwire/webdriver.py", line 93, in __init__
    options=seleniumwire_options
  File "/opt/python/seleniumwire/backend.py", line 35, in create
    proxy = MitmProxy(addr, port, options)
  File "/opt/python/seleniumwire/server.py", line 61, in __init__
    **self._get_upstream_proxy_args(),
  File "/opt/python/seleniumwire/thirdparty/mitmproxy/optmanager.py", line 223, in update
    u = self.update_known(**kwargs)
  File "/opt/python/seleniumwire/thirdparty/mitmproxy/optmanager.py", line 215, in update_known
    self.changed.send(self, updated=updated)
  File "/opt/python/blinker/base.py", line 267, in send
    for receiver in self.receivers_for(sender)]
  File "/opt/python/blinker/base.py", line 267, in <listcomp>
    for receiver in self.receivers_for(sender)]
  File "/opt/python/seleniumwire/thirdparty/mitmproxy/server/config.py", line 90, in configure
    _, spec = server_spec.parse_with_mode(options.mode)
  File "/opt/python/seleniumwire/thirdparty/mitmproxy/net/server_spec.py", line 80, in parse_with_mode
    return mode, parse(server_spec)
  File "/opt/python/seleniumwire/thirdparty/mitmproxy/net/server_spec.py", line 47, in parse
    raise ValueError("Invalid server scheme: {}".format(scheme))
ValueError: Invalid server scheme: None

In the seleniumwire proxy options I specify backend: default but for some reason it is defaulting to the mitmproxy.

        params = {
            "options": chrome_options,
            "seleniumwire_options": {
                "request_storage_base_dir": "/tmp",  # Use /tmp to store captured data
                "backend": "default",
            },
        }
adamarla commented 3 years ago

@wkeeling I apologize - a confounded trailing comma caused the auto formatter to convert a dict into a single element tuple and consequently the way **kwargs were being passed in was incorrect. I only just noticed it now. Please go ahead and close this issue (user error, for a second time), the software is performing as expected!

wkeeling commented 3 years ago

Easily done, but glad it's working now.

tmtong commented 2 years ago

For me, turns out that if you use proxy you need to include http:// before the ip:port