wkeeling / selenium-wire

Extends Selenium's Python bindings to give you the ability to inspect requests made by the browser.
MIT License
1.9k stars 254 forks source link

Selenium Wire not working with zyte-smartproxy-headless-proxy #322

Closed AndreuJove closed 3 years ago

AndreuJove commented 3 years ago

Dear selenium-wire,

I have been using proxy running in my localhost port 3128 and it is working in normal selenium.

I guess that the command: chrome_options.add_argument("--proxy-server=localhost:3128") for selenium-wire is not working.

Any idea of solving this?

Thanks a lot

wkeeling commented 3 years ago

When you're using a proxy, you need to use Selenium Wire's proxy option to specify it. This is because Selenium Wire hijacks the normal proxy mechanism in order to capture requests. So in your case, you'd need to do:

options = {
    'proxy': {
        'http': 'http://localhost:3128',
        'https': 'https://localhost:3128',
    }
}
driver = webdriver.Chrome(seleniumwire_options=options)

and then you should remove the --proxy-server argument from your chrome_options.

AndreuJove commented 3 years ago

Dear wkeeling,

Thank a lot for your quick response.

I have done what you say it but it gives me the next following error:

OSError: [Errno 0] Error
2021-06-12 19:47:49 [seleniumwire.proxy.handler] ERROR: Error making request

Much appreciated,

Andreu Jové

wkeeling commented 3 years ago

Are you able to share the code you're using and the config options you're passing to the webdriver?

AndreuJove commented 3 years ago

Dear wkeeling,

I can share some of the code. The proxy needs authentication but I already authenticate when I run the proxy on the port 3128.

I have checked that is going throw the proxy but it fails in seleniumwire/proxy/proxy2.py", line 91, in proxy_request

Here is the code that I'm using now.

from seleniumwire import webdriver

options = {
    'proxy': {
        'http': 'http://localhost:3128',
        'https': 'https://localhost:3128',
    }
}

SELENIUM_DRIVER_ARGUMENTS = [
    # "--headless",
    "log-level=3",
    "--no-sandbox",
    "start-maximized",
    "enable-automation",
    "--disable-infobars",
    "--disable-xss-auditor",
    "--disable-setuid-sandbox",
    "--disable-xss-auditor",
    "--disable-web-security",
    "--disable-dev-shm-usage",
    "--disable-webgl",
    "--disable-popup-blocking",
    "ignore-certificate-errors",
]

def add_driver_arguments(
    chrome_options: webdriver.ChromeOptions, driver_arguments: list
) -> None:
    for argument in driver_arguments:
        chrome_options.add_argument(argument)

def main(): 
    chrome_options = webdriver.ChromeOptions()
    add_driver_arguments(chrome_options, SELENIUM_DRIVER_ARGUMENTS)
    driver = webdriver.Chrome(
               options=chrome_options,
               seleniumwire_options=options
            )
    driver.get("https://www.dieteticacentral.com/marcas/aquilea/aquilea-melatonina-1-95mg-30comp.html")
    driver.close()

if __name__ == "__main__":
    main()
wkeeling commented 3 years ago

Thanks, that all looks ok as far as I can see. Are you able to post the full traceback you're getting?

AndreuJove commented 3 years ago

Dear wkeeling,

Thanks for your quick reply.

Here is the full traceback. Is quite wierd because in normal selenium it works fine.

Error making request
Traceback (most recent call last):
  File ".local/lib/python3.8/site-packages/seleniumwire/proxy/proxy2.py", line 91, in proxy_request
    conn.request(self.command, path, req_body, dict(req.headers))
  File "/usr/lib/python3.8/http/client.py", line 1255, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1301, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1250, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1010, in _send_output
    self.send(msg)
  File "/usr/lib/python3.8/http/client.py", line 950, in send
    self.connect()
  File "/.local/lib/python3.8/site-packages/seleniumwire/proxy/proxy2.py", line 368, in connect
    super().connect()
  File "/usr/lib/python3.8/http/client.py", line 1424, in connect
    self.sock = self._context.wrap_socket(self.sock,
  File "/usr/lib/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib/python3.8/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/usr/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
OSError: [Errno 0] Error
AndreuJove commented 3 years ago

Dear wkeeling,

The following code is the same but for normal selenium package that it is working fine. I thought it might be helpful.

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.proxy import Proxy, ProxyType

SELENIUM_DRIVER_ARGUMENTS = [
    # "--headless",
    "log-level=3",
    "--no-sandbox",
    "start-maximized",
    "enable-automation",
    "--disable-infobars",
    "--disable-xss-auditor",
    "--disable-setuid-sandbox",
    "--disable-xss-auditor",
    "--disable-web-security",
    "--disable-dev-shm-usage",
    "--disable-webgl",
    "--disable-popup-blocking",
    "ignore-certificate-errors",
]

def add_driver_arguments(
    chrome_options: webdriver.ChromeOptions, driver_arguments: list
) -> None:
    for argument in driver_arguments:
        chrome_options.add_argument(argument)

def main():
    headless_proxy = "127.0.0.1:3128"
    proxy = Proxy({
        'proxyType': ProxyType.MANUAL,
        'httpProxy': headless_proxy,
        'ftpProxy' : headless_proxy,
        'sslProxy' : headless_proxy,
        'noProxy'  : ''
    })

    capabilities = dict(DesiredCapabilities.CHROME)
    proxy.add_to_capabilities(capabilities)
    chrome_options = webdriver.ChromeOptions()
    add_driver_arguments(chrome_options, SELENIUM_DRIVER_ARGUMENTS)
    driver = webdriver.Chrome(
               options=chrome_options,
               desired_capabilities=capabilities
            )
    driver.get("https://www.dieteticacentral.com/marcas/aquilea/aquilea-melatonina-1-95mg-30comp.html")
    driver.close()

if __name__ == "__main__":
    main()
wkeeling commented 3 years ago

Thanks. It looks like you're using an older version of Selenium Wire. The old versions sometimes had issues with SSL handshaking and proxy servers - and the traceback indicates that seems to be happening here. Is it possible for you to upgrade?

pip install --upgrade selenium-wire

The latest version is 4.3.0

AndreuJove commented 3 years ago

Dear wkeeling,

Thanks for your quick reply. I have updated the version I was on 2.1.2 but I'm chasing a new error:

127.0.0.1:35492: Traceback (most recent call last):
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/server.py", line 113, in handle
    root_layer()
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/modes/http_proxy.py", line 23, in __call__
    layer()
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/tls.py", line 285, in __call__
    layer()
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/http1.py", line 100, in __call__
    layer()
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/http.py", line 205, in __call__
    if not self._process_flow(flow):
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/http.py", line 306, in _process_flow
    return self.handle_upstream_connect(f)
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/http.py", line 253, in handle_upstream_connect
    return layer()
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/http.py", line 102, in __call__
    layer()
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/tls.py", line 278, in __call__
    self._establish_tls_with_client_and_server()
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/tls.py", line 358, in _establish_tls_with_client_and_server
    self._establish_tls_with_server()
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/server/protocol/tls.py", line 445, in _establish_tls_with_server
    self.server_conn.establish_tls(
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/connections.py", line 290, in establish_tls
    self.convert_to_tls(cert=client_cert, sni=sni, **kwargs)
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/net/tcp.py", line 382, in convert_to_tls
    context = tls.create_client_context(
  File "/.local/lib/python3.8/site-packages/seleniumwire/thirdparty/mitmproxy/net/tls.py", line 285, in create_client_context
    param = SSL._lib.SSL_CTX_get0_param(context._context)
AttributeError: module 'lib' has no attribute 'SSL_CTX_get0_param'

Do you have any idea about this?

Much appreciated,

Andreu

wkeeling commented 3 years ago

That looks as though you're using a different version of pyopenssl to what Selenium Wire needs. Are you able to see what version with:

pip show pyopenssl

Selenium Wire needs 19.1.0 or above.

wkeeling commented 3 years ago

Now I look again, this may be because OpenSSL itself isn't up to date. OpenSSL normally comes preinstalled on most platforms but it's possible that the version you're using could be old. What OS are you using?

AndreuJove commented 3 years ago

Dear wkeeling,

I'm using Linux. I manage to create a new environment and it is working. But I'm facing a new problem:

time="2021-06-12T19:54:30Z" level=warning msg="[127.0.0.1:44493] (227633266689): cennot finish TLS handshake: EOF"

I'm using this arguments for driver:

SELENIUM_DRIVER_ARGUMENTS = [
    "--headless",
    "log-level=3",
    "--no-sandbox",
    "start-maximized",
    "enable-automation",
    "--disable-infobars",
    "--disable-xss-auditor",
    "--disable-setuid-sandbox",
    "--disable-xss-auditor",
    "--disable-web-security",
    "--disable-dev-shm-usage",
    "--disable-webgl",
    "--disable-popup-blocking",
    "--ignore-certificate-errors-spki-list",
    "ignore-certificate-errors",
]

I'm only installing selenium-wire:

selenium-wire==4.3.0
wkeeling commented 3 years ago

Thanks @AndreuJove I've not seen that one before. Could you try adding mitm_http2: False to your seleniumwire_options:

options = {
    'mitm_http2': False,  # Add this
    'proxy': {
        'http': 'http://localhost:3128',
        'https': 'https://localhost:3128',
    }
}

Also could you let me know what version of OpenSSL you've got installed, with:

openssl version

On my version of Linux, I have OpenSSL 1.1.1 11 Sep 2018 installed.

AndreuJove commented 3 years ago

Dear wkeeling,

Thanks for your update. I have added this options to seleniumwire_options but still have the exact same error.

time="2021-06-12T19:54:30Z" level=warning msg="[127.0.0.1:44493] (227633266689): cennot finish TLS handshake: EOF"

The problem is running the code inside a docker container(OpenSSL 1.1.1d 10 Sep 2019), in my local host is running great (OpenSSL 1.1.1f 31 Mar 2020).

Do you have any idea of what is the problem?

Thanks a lot,

Andreu

wkeeling commented 3 years ago

Does it work if you omit the proxy settings and go direct from Selenium Wire to the target site - using your Docker setup?

AndreuJove commented 3 years ago

Dear wkeeling,

Thank you for your response. I have tried without proxy and it's not working either. I have also one more question about selenium-wire, meanwhile I'm debbugging the problem of docker container. I would like to change the level of logging of selenium-wire. I have tried:

selenium_wire_logger = logging.getLogger("seleniumwire")
selenium_wire_logger.setLevel(logging.ERROR)

But it doesn't work. Do we have any other way?

Thanks a lot for your help,

Andreu

AndreuJove commented 3 years ago

Dear wkeeling,

Do you have any news on:

time="2021-06-12T19:54:30Z" level=warning msg="[127.0.0.1:44493] (227633266689): cennot finish TLS handshake: EOF"

Thanks a lot,

Andreu Jové

AndreuJove commented 3 years ago

Dear wkeeling,

I found the problem. I'm using crawlera headless proxy and has the next following problem:

Since crawlera-headless-proxy has to inject X-Headers into responses, it works with your browser only by HTTP 1.1. Unfortunately, there is no clear way how to hijack HTTP2 connections. Also, since it is effectively MITM proxy, you need to use its own TLS certificate. This is hardcoded into the binary so you have to download it and apply it to your system. Please consult with manuals of your operating system how to do that.

https://github.com/zytedata/zyte-smartproxy-headless-proxy

With the configuration that you told me of mitm_http2: False should deactivate the http2 connections and work, but it's not.

Is there any other seleniumwire options can I pass?

Thanks a lot,

Andreu Jové

wkeeling commented 3 years ago

I suspect that it has deactivated the http2 connections but the proxy is closing off the connection for some other reason. Just looking at the GitHub page for the proxy, have you tried setting --dont-verify-crawlera-cert for the proxy itself?

AndreuJove commented 3 years ago

Dear wkeeling,

Unfourtanely I tried and it didn't work either. It is quite weird. Is there any other way to deactivate http2?

Thanks a lot for your help,

Andreu Jové

wkeeling commented 3 years ago

Ok thanks. I'm afraid I'm running out of ideas at this point. The mitm_http2: False should definitely work when placed in the seleniumwire_options as it's used as a workaround for other issues. Not all websites support HTTP2. If you specify a site that doesn't use HTTP2 does it work? E.g. http://httpbin.org/anything

AndreuJove commented 3 years ago

Dear wkeeling,

Can you please reopen the issue? It doesn't work either I guess it is a problem of the connections of both proxies. Is there any other way to ignore certificates errors in selenium-wire?

SELENIUM_DRIVER_ARGUMENTS = [
    "--headless",
    "log-level=3",
    "--no-sandbox",
    "start-maximized",
    "enable-automation",
    "--disable-infobars",
    "--disable-xss-auditor",
    "--disable-setuid-sandbox",
    "--disable-xss-auditor",
    "--disable-web-security",
    "--disable-dev-shm-usage",
    "--disable-webgl",
    "--disable-popup-blocking",
    "--ignore-certificate-errors",
    "--ignore-certificate-errors-spki-list",
    "--ignore-ssl-errors",
    "--allow-insecure-localhost",
]

Thank you so much,

Andreu Jové

wkeeling commented 3 years ago

@AndreuJove will re-open. Selenium Wire ignores SSL certificate errors by default. It's going to require some further debugging. I feel we should perhaps update the title of this ticket to e.g. "Selenium Wire not working with zyte-smartproxy-headless-proxy" if you agree?

AndreuJove commented 3 years ago

@wkeeling

Yes sure I change it.

heisen273 commented 3 years ago

@AndreuJove , @wkeeling had the same issue, managed to fix it by pip3 install -U cryptography. Cryptography module was outdated.

wkeeling commented 3 years ago

Thanks @heisen273

@AndreuJove are you able to confirm whether that fixes for you?

AndreuJove commented 3 years ago

@heisen273

Thank you very much for your help. Could you please provide me the version of cryptography that you are using. I should put it the requirements.txt of my project that are installed in my docker container.

Thanks!

heisen273 commented 3 years ago

@AndreuJove in my case i've used latest available cryptography-3.4.7, it was updated from cryptography-2.8.

AndreuJove commented 3 years ago

Dear @heisen273, @wkeeling,

Not working inside the docker container either. Here is my requirements.txt:

scrapy==2.4.1
shub==2.10.0
scrapinghub==2.2.1
msgpack==0.6.2
loginform==1.2.0
scrapy-crawlera==1.6.0
requests==2.22.0
jsonlines==1.2.0
jsonpath_ng==1.4.3
unidecode==1.1.1
extruct==0.9.0
pre-commit==2.5.1
importlib-metadata==1.7.0
jsonschema==3.2.0
slugify==0.0.1
selenium-wire==4.3.0
cryptography==3.4.7

The running spider has this settings:

[scrapy.utils.log] Versions: 
lxml 4.6.2.0, 
libxml2 2.9.10, 
cssselect 1.1.0, 
parsel 1.6.0, 
w3lib 1.22.0, 
Twisted 20.3.0, 
Python 3.8.2 (default, Apr 23 2020, 14:32:57) - [GCC 8.3.0], 
pyOpenSSL 19.1.0 (OpenSSL 1.1.1k 25 Mar 2021), 
cryptography 3.4.7, 
Platform Linux-4.15.0-76-generic-x86_64-with-glibc2.2.5
AndreuJove commented 3 years ago

Dear @heisen273,

How are you connecting selenium-wire with crawlera? Do you have any example?

Thanks a lot!!!

Andreu

heisen273 commented 3 years ago

@AndreuJove , no I simply used it locally on my laptop. I've had exception AttributeError: module 'lib' has no attribute 'SSL_CTX_get0_param' which I've managed to fix by updating cryptography module.

>>> from seleniumwire import webdriver
>>> wireOpts = { 'proxy': {'http': 'http://10.14.0.150:8082', 'https': 'https://10.14.0.150:8082'}}
>>> cap = webdriver.DesiredCapabilities.CHROME.copy()
>>> extensions = ["SingleFile_v1.18.86.crx", "Image-Downloader-Continued_v2.8.crx"]
>>> opts = webdriver.ChromeOptions()
>>> for path in extensions:
...     opts.add_extension(path)
...
>>> cap.update(opts.to_capabilities())
>>> driver = webdriver.Chrome(executable_path='/Users/anton/Downloads/chromedriver', seleniumwire_options=wireOpts, desired_capabilities=cap)
AndreuJove commented 3 years ago

Dear @wkeeling,

I have managed to access directly the proxy without using the process in my localhost. The problem now is that I have to change some headers of the request but the request_interceptor is not working.

I followed the documentation.


def interceptor(request):
    request.headers['X-Crawlera-Cookies'] = 'disable'
    request.headers['X-Crawlera-Profile'] = 'desktop'

self.driver.request_interceptor = interceptor
self.driver.get(request.url)

Thanks,

Andreu

wkeeling commented 3 years ago

@AndreuJove good news.

Just wondering how you verified that those headers have not been added? Did you retrieve the captured requests and print out the headers, and they weren't present? You could also maybe try http://httpbin.org/headers

AndreuJove commented 3 years ago

Dear @wkeeling,

Thanks for your quick response. Yes I'm requesting http://httpbin.org/headers and seeing what appears. It looks like this headers can't be added because I can change for example 'Referer', but these ones no.

CODE:

def interceptor(request):
    del request.headers['Referer']
    request.headers['Referer'] = "new_referer"
    request.headers['X-Crawlera-Cookies'] = "disable"
    request.headers['X-Crawlera-Profile'] = "desktop"

RESULT from httpbin

{
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", 
    "Accept-Encoding": "gzip, deflate, br", 
    "Accept-Language": "en-US,en;q=0.9", 
    "Host": "httpbin.org", 
    "Referer": "new_referer",
    "Sec-Ch-Ua": "\" Not;A Brand\";v=\"99\", \"Google Chrome\";v=\"91\", \"Chromium\";v=\"91\"", 
    "Sec-Ch-Ua-Mobile": "?0", 
    "Sec-Fetch-Dest": "document", 
    "Sec-Fetch-Mode": "navigate", 
    "Sec-Fetch-Site": "same-origin", 
    "Sec-Fetch-User": "?1", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36", 
    "X-Amzn-Trace-Id": "Root=1-60cc8df5-596b0cc654bc0fe97c87959b"
  }
}

Do you have any idea about what is happening?

Andreu

wkeeling commented 3 years ago

Ok that's strange. So you can change an existing header but not add a new one. Have you tried adding a generic foo=bar header, just in case something is stripping out the "X-" headers?

AndreuJove commented 3 years ago

I have tried now:

def interceptor(request):
    del request.headers['Referer']
    request.headers['Referer'] = "new_referer"
    request.headers['foo'] = 'bar'
    request.headers['X-Crawlera-Cookies'] = "disable"
    request.headers['X-Crawlera-Profile'] = "desktop"
{
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", 
    "Accept-Encoding": "gzip, deflate, br", 
    "Accept-Language": "en-US,en;q=0.9", 
    "Foo": "bar", 
    "Host": "httpbin.org", 
    "Referer": "new_referer", 
    "Sec-Fetch-Dest": "document", 
    "Sec-Fetch-Mode": "navigate", 
    "Sec-Fetch-Site": "same-origin", 
    "Sec-Fetch-User": "?1", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Edg/88.0.705.50", 
    "X-Amzn-Trace-Id": "Root=1-60cc8f20-03336f7c5db4f6cc79281aa7"
  }
}
AndreuJove commented 3 years ago

@wkeeling D

I have seen that the problem is the X in front of the header because without the X it appears on the request headers.

Why is that X a problem?

wkeeling commented 3 years ago

@AndreuJove there must be something in the stack (or network) that's stripping out the X- headers. I don't have access to my machine currently, but I'll see if I can reproduce locally a bit later.

AndreuJove commented 3 years ago

Dear @wkeeling,

Okey!! Can't wait for your answer.

wkeeling commented 3 years ago

@AndreuJove so I've tried reproducing with:

def interceptor(request):
    del request.headers['Referer']
    request.headers['Referer'] = "new_referer"
    request.headers['foo'] = 'bar'
    request.headers['X-Crawlera-Cookies'] = "disable"
    request.headers['X-Crawlera-Profile'] = "desktop"

and I'm getting:

{
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", 
    "Accept-Encoding": "gzip, deflate", 
    "Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8", 
    "Foo": "bar", 
    "Host": "httpbin.org", 
    "Proxy-Connection": "keep-alive", 
    "Referer": "new_referer", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36", 
    "X-Amzn-Trace-Id": "Root=1-60ccddf3-6c7a7c3f237a3b433c331272", 
    "X-Crawlera-Cookies": "disable", 
    "X-Crawlera-Profile": "desktop"
  }
}

I'm using an upstream proxy - but an instance of mitmproxy.

I'm thinking it must be something local to your environment. Is the proxy you're using stripping the headers out? Have you tried without the proxy - just a direct request to httpbin.org?

AndreuJove commented 3 years ago

Dear @wkeeling,

I have seen that yes it dissappears, maybe is because the proxy that I'm using after selenium-wire. Need to debug more because the 2 proxies are not connecting well.

AndreuJove commented 3 years ago

Dear @wkeeling,

I'm recieving


[seleniumwire.server] 127.0.0.1:54132: Certificate verification error for www.dieteticacentral.com: self signed certificate in certificate chain (errno: 19, depth: 1)

WARNING | [seleniumwire.server] 127.0.0.1:54132: Invalid certificate, closing connection. Pass --ssl-insecure to disable validation.

Do you know why it's happening this if selenium-wire ignores ssl certificates?

Thanks a lot,

Andreu

wkeeling commented 3 years ago

That is strange as Selenium Wire does allow insecure SSL certificates by default. Have you tried adding the upstream proxy root certificate to the browser's trusted root certificate authorities?

AndreuJove commented 3 years ago

Dear wkeeling,

I do apologise for not replying. I had to develop other issues.

What do you mean about adding the upstream proxy root?

The problem that I face know is that I'm recieving this warning:

[seleniumwire.server] 127.0.0.1:42076: Invalid certificate, closing connection. Pass --ssl-insecure to disable validation.

But I already have already the argument for selenium:

SELENIUM_DRIVER_ARGUMENTS = [
    "--headless",
    "--no-sandbox",
    "start-maximized",
    "enable-automation",
    "--disable-infobars",
    "--disable-xss-auditor",
    "--disable-setuid-sandbox",
    "--disable-xss-auditor",
    "--disable-web-security",
    "--disable-dev-shm-usage",
    "--disable-webgl",
    "--disable-popup-blocking",
    "--ignore-certificate-errors-spki-list",
    "--ignore-ssl-errors",
    "--ssl-insecure"
]

Thanks a lot,

Andreu Jové

AndreuJove commented 3 years ago

Dear @wkeeling,

I'm facing a new error, when using interceptor crawlera headless proxy does not recieve what it should recieve.

Do you know something about that?

Thanks a lot,

Andreu

wkeeling commented 3 years ago

What data is missing that the proxy should be receiving? It maybe worth temporarily disabling the crawlera proxy and validating that the interceptor is doing the right thing, by checking it against https://httpbin.org. Once confirmed the interceptor is correct, add the proxy back.

Regarding the certificate error can you try adding --ignore-certificate-errors to your list of options? I wouldn't expect that to make any difference as Selenium Wire implicitly sets it - but worth a try.

AndreuJove commented 3 years ago

Dear @wkeeling ,

Sorry for not replying I had to develop other features.

We have solved the previous error adding another ca.crt of crawlera. But we are fascing another issue related to TLS certificates:

502 Bad Gateway   502 Bad Gateway TlsProtocolException("Cannot establish TLS with www.farmavazquez.com:443 (sni: www.farmavazquez.com): TlsException('SSL handshake error: WantReadError()')")
--

Do we need to install any other certificates? The problem is in the deploy in local is working fine. That's why I think that we are missing something to install in our docker container.

AndreuJove commented 3 years ago

Dear @wkeeling ,

From the logs:

[seleniumwire.server] 127.0.0.1:47122: Certificate verification error for www.farmavazquez.com: self signed certificate in certificate chain (errno: 19, depth: 1)

The proxy that we are using: https://github.com/zytedata/zyte-smartproxy-headless-proxy is configurated in port 3128:

    'proxy': {
        'http': 'http://localhost:3128',
        'https': 'https://localhost:3128',
    }
}

Why is running in 47122 selenium-wire???

Thanks a lot for your help,

Andreu Jové

wkeeling commented 3 years ago

Thanks @AndreuJove

Port 47122 is probably the port the Selenium Wire server is listening on. When you run Selenium Wire it starts the server on a random free port number.

As mentioned previously, Selenium Wire is configured to ignore certificate errors by default. Does that message actually cause a problem loading farmavazquez.com ?

AndreuJove commented 3 years ago

Dear @wkeeling,

Thank you for your help.

Yes that message might be causing an error in our Deployment site (Scrapy Cloud). But I'm not sure if is only this. How can we avoid all these kind of messages?

Also the log shows

[seleniumwire.server] 127.0.0.1:47122: Invalid certificate, closing connection. Pass --ssl-insecure to disable validation.

But I guess is not working either. To pass `--ssl-

Do we have to install any certificate of selenium-wire?

Kind regards,

Hope that we can finally find a solution, your library is very powerful we would like to use it in production.

Andreu