I want to deploy my API in GKE of GCP. But the problem is that selenium-wire do not capture any requests.
I'll give you some context: Actually, my stack is composed of FastAPI, Redis, Celery, TOR Proxy and Selenium-wire. On my own server, all this stuff are dockerized and everything works very well. And effectively, I use swarm network so that my containers are be able to communicate each other.
The seleniumwire is executed in celery-container. So, I configured 'addr' parameter with hostname of celery container and it works pretty good!
And today, I would like to migrate my API to GCP with Kubernetes but it does not working correctly. In new architecture, all containers become pods. For example:
FastAPI -> pod
Celery -> pod
Redis -> pod
TOR Proxy -> pod
Selenium-grid -> pod (exposed with related ports like 4444, 4443, 4442)
selenium chrome -> pod
But actually when I configure my selenium-wire with celery service name which is responsable of executing selenium-wire code and then it could not find the hostname. You can take a look at the following traceback error.
Another detail, my celery deployment is exposed with ClusterIP, so my celery-service listens the port 80 and redirects all requests to the target port 3001 which is configured in Dockerfile of my celery image.
When I use docker with swarm, the hostname of celery is working but in K8S, it does not work !
Any solution to handle this ? I'm sure I'm missing something.
Here is the code examples and error message
Code source
def get_webdriver_session(
grid_hub_executor_addr: str,
proxy_name: str,
remote_selenium_wire_addr: str,
auto_config_bool: bool = True,
need_proxy: bool = True,
proxy_port: int = 9050,
proxy_protocol: str = "socks5",
webdriver_type: str = "chrome",
) -> webdriver:
"""Init a selenium-wire webdriver.
Args:
grid_hub_executor_addr (str): selenium grid hub address
proxy_name (str): tor proxy name (container name)
remote_selenium_wire_addr (str): address of the machine running Selenium Wire.
need_proxy (bool): Need proxy. Defaults to True
proxy_port (int, optional): proxy port. Defaults to 9050.
proxy_protocol (str, optional): proxy protocol. Defaults to "socks5".
webdriver_type (str, optional): webdriver type like chrome, firefox, edge.
Defaults to "chrome".
Raises:
WebdriverNotSupported: When the user send a webdriver type which is not allowed.
Returns:
webdriver: selenium driver object
"""
capabilities = get_browser_options(
webdriver_type=webdriver_type,
)
seleniumwire_options_dict = {
"auto_config": auto_config_bool,
"addr": remote_selenium_wire_addr,
}
if need_proxy:
seleniumwire_options_dict["proxy"] = {
"https": f"{proxy_protocol}://{proxy_name}:{proxy_port}",
"http": f"{proxy_protocol}://{proxy_name}:{proxy_port}",
}
driver = webdriver.Remote(
command_executor=grid_hub_executor_addr,
desired_capabilities=capabilities,
seleniumwire_options=seleniumwire_options_dict,
)
# maximixe window (full screen)
driver.maximize_window()
return driver
Reproduce
SELENIUM_HUB_ADDR = "http://selenium-grid-service:4444" # name of selenium grid service in k8s
REMOTE_SW_ADDR = "celery-service" # name of celery service in k8s
TOR_PROXY_SERVICE = "tor-proxy-service" # name of tor proxy service k8s
driver = get_webdriver_session(
grid_hub_executor_addr=SELENIUM_HUB_ADDR,
proxy_name=TOR_PROXY_SERVICE,
remote_selenium_wire_addr=REMOTE_SW_ADDR,
)
Error
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/seleniumwire/thirdparty/mitmproxy/server/server.py", line 42, in __init__
(config.options.listen_host, config.options.listen_port)
File "/usr/local/lib/python3.7/dist-packages/seleniumwire/thirdparty/mitmproxy/net/tcp.py", line 624, in __init__
self.socket.bind(self.address)
socket.gaierror: [Errno -5] No address associated with hostname
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "alican.py", line 50, in <module>
seleniumwire_options=seleniumwire_options_dict,
File "/usr/local/lib/python3.7/dist-packages/seleniumwire/webdriver.py", line 295, in __init__
config = self._setup_backend(seleniumwire_options)
File "/usr/local/lib/python3.7/dist-packages/seleniumwire/webdriver.py", line 44, in _setup_backend
options=seleniumwire_options,
File "/usr/local/lib/python3.7/dist-packages/seleniumwire/backend.py", line 24, in create
backend = MitmProxy(addr, port, options)
File "/usr/local/lib/python3.7/dist-packages/seleniumwire/server.py", line 61, in __init__
self.master.server = ProxyServer(ProxyConfig(mitmproxy_opts))
File "/usr/local/lib/python3.7/dist-packages/seleniumwire/thirdparty/mitmproxy/server/server.py", line 51, in __init__
) from e
seleniumwire.thirdparty.mitmproxy.exceptions.ServerException: Error starting proxy server: gaierror(-5, 'No address associated with hostname')
Hi,
I want to deploy my API in GKE of GCP. But the problem is that selenium-wire do not capture any requests.
I'll give you some context: Actually, my stack is composed of FastAPI, Redis, Celery, TOR Proxy and Selenium-wire. On my own server, all this stuff are dockerized and everything works very well. And effectively, I use swarm network so that my containers are be able to communicate each other.
The seleniumwire is executed in celery-container. So, I configured 'addr' parameter with hostname of celery container and it works pretty good!
And today, I would like to migrate my API to GCP with Kubernetes but it does not working correctly. In new architecture, all containers become pods. For example:
FastAPI -> pod Celery -> pod Redis -> pod TOR Proxy -> pod Selenium-grid -> pod (exposed with related ports like 4444, 4443, 4442) selenium chrome -> pod
But actually when I configure my selenium-wire with celery service name which is responsable of executing selenium-wire code and then it could not find the hostname. You can take a look at the following traceback error.
Another detail, my celery deployment is exposed with ClusterIP, so my celery-service listens the port 80 and redirects all requests to the target port 3001 which is configured in Dockerfile of my celery image.
When I use docker with swarm, the hostname of celery is working but in K8S, it does not work ! Any solution to handle this ? I'm sure I'm missing something.
Here is the code examples and error message
Code source
Reproduce
Error