wkeeling / selenium-wire

Extends Selenium's Python bindings to give you the ability to inspect requests made by the browser.
MIT License
1.9k stars 251 forks source link

Kubernetes deployment hostname error #646

Closed alicanyuksel closed 1 year ago

alicanyuksel commented 1 year ago

Hi,

I want to deploy my API in GKE of GCP. But the problem is that selenium-wire do not capture any requests.

I'll give you some context: Actually, my stack is composed of FastAPI, Redis, Celery, TOR Proxy and Selenium-wire. On my own server, all this stuff are dockerized and everything works very well. And effectively, I use swarm network so that my containers are be able to communicate each other.

The seleniumwire is executed in celery-container. So, I configured 'addr' parameter with hostname of celery container and it works pretty good!

And today, I would like to migrate my API to GCP with Kubernetes but it does not working correctly. In new architecture, all containers become pods. For example:

FastAPI -> pod Celery -> pod Redis -> pod TOR Proxy -> pod Selenium-grid -> pod (exposed with related ports like 4444, 4443, 4442) selenium chrome -> pod

But actually when I configure my selenium-wire with celery service name which is responsable of executing selenium-wire code and then it could not find the hostname. You can take a look at the following traceback error.

Another detail, my celery deployment is exposed with ClusterIP, so my celery-service listens the port 80 and redirects all requests to the target port 3001 which is configured in Dockerfile of my celery image.

When I use docker with swarm, the hostname of celery is working but in K8S, it does not work ! Any solution to handle this ? I'm sure I'm missing something.

Here is the code examples and error message

Code source

def get_webdriver_session(
    grid_hub_executor_addr: str,
    proxy_name: str,
    remote_selenium_wire_addr: str,
    auto_config_bool: bool = True,
    need_proxy: bool = True,
    proxy_port: int = 9050,
    proxy_protocol: str = "socks5",
    webdriver_type: str = "chrome",
) -> webdriver:
    """Init a selenium-wire webdriver.

    Args:
        grid_hub_executor_addr (str): selenium grid hub address
        proxy_name (str): tor proxy name (container name)
        remote_selenium_wire_addr (str): address of the machine running Selenium Wire.
        need_proxy (bool): Need proxy. Defaults to True
        proxy_port (int, optional): proxy port. Defaults to 9050.
        proxy_protocol (str, optional): proxy protocol. Defaults to "socks5".
        webdriver_type (str, optional): webdriver type like chrome, firefox, edge.
        Defaults to "chrome".

    Raises:
        WebdriverNotSupported: When the user send a webdriver type which is not allowed.

    Returns:
        webdriver: selenium driver object
    """
    capabilities = get_browser_options(
        webdriver_type=webdriver_type,
    )

    seleniumwire_options_dict = {
        "auto_config": auto_config_bool,
        "addr": remote_selenium_wire_addr,
    }
    if need_proxy:
        seleniumwire_options_dict["proxy"] = {
            "https": f"{proxy_protocol}://{proxy_name}:{proxy_port}",
            "http": f"{proxy_protocol}://{proxy_name}:{proxy_port}",
        }

    driver = webdriver.Remote(
        command_executor=grid_hub_executor_addr,
        desired_capabilities=capabilities,
        seleniumwire_options=seleniumwire_options_dict,
    )

    # maximixe window (full screen)
    driver.maximize_window()

    return driver

Reproduce


SELENIUM_HUB_ADDR = "http://selenium-grid-service:4444" # name of selenium grid service in k8s
REMOTE_SW_ADDR = "celery-service" # name of celery service in k8s
TOR_PROXY_SERVICE = "tor-proxy-service" # name of tor proxy service k8s

driver = get_webdriver_session(
    grid_hub_executor_addr=SELENIUM_HUB_ADDR,
    proxy_name=TOR_PROXY_SERVICE,
    remote_selenium_wire_addr=REMOTE_SW_ADDR,
)

Error

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/seleniumwire/thirdparty/mitmproxy/server/server.py", line 42, in __init__
    (config.options.listen_host, config.options.listen_port)
  File "/usr/local/lib/python3.7/dist-packages/seleniumwire/thirdparty/mitmproxy/net/tcp.py", line 624, in __init__
    self.socket.bind(self.address)
socket.gaierror: [Errno -5] No address associated with hostname

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "alican.py", line 50, in <module>
    seleniumwire_options=seleniumwire_options_dict,
  File "/usr/local/lib/python3.7/dist-packages/seleniumwire/webdriver.py", line 295, in __init__
    config = self._setup_backend(seleniumwire_options)
  File "/usr/local/lib/python3.7/dist-packages/seleniumwire/webdriver.py", line 44, in _setup_backend
    options=seleniumwire_options,
  File "/usr/local/lib/python3.7/dist-packages/seleniumwire/backend.py", line 24, in create
    backend = MitmProxy(addr, port, options)
  File "/usr/local/lib/python3.7/dist-packages/seleniumwire/server.py", line 61, in __init__
    self.master.server = ProxyServer(ProxyConfig(mitmproxy_opts))
  File "/usr/local/lib/python3.7/dist-packages/seleniumwire/thirdparty/mitmproxy/server/server.py", line 51, in __init__
    ) from e
seleniumwire.thirdparty.mitmproxy.exceptions.ServerException: Error starting proxy server: gaierror(-5, 'No address associated with hostname')
alicanyuksel commented 1 year ago

By the way, I solved the problem. The issue can be closed. If you have one day the same problem do not hesitate to open the issue again :)