ultrafunkamsterdam / undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
https://github.com/UltrafunkAmsterdam/undetected-chromedriver
GNU General Public License v3.0
9.99k stars 1.16k forks source link

urllib3.connectionpool:Connection pool is full #458

Open ManiMozaffar opened 2 years ago

ManiMozaffar commented 2 years ago

Hi,

whenever I'm using CDP listener with the undetected driver, on the new version, I'm getting this error after adding a few listeners: WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: localhost

By adding a CDP listener, a few times in a row, with 3 seconds interval between each, it'll keep the errors shown above on the terminal. It was running fine on previous versions, however, the previous version was detected by Cloudflare, so using them is not an option for me. I guess the reason is that the CDP connection made will not be closed after they ran completely.

driver.add_cdp_listener('Network.requestWillBeSent', self.get_header)

I should also mention that the method get_header usually finishes in less than mili seconds.

Workaround: Python 3.9.9 undetected-chromedriver 3.1.3 selenium 4.1.0

ManiMozaffar commented 2 years ago

so far I've readen the source code, If I'm not wrong, the problem is that the connection to the localhost's pool should increase in order to solve this problem. I noticed 2 connection, that could be related to cdp add listen eventer.

Line 1582, webdriver\remote\webdriver.py

def _get_cdp_details(self):
        import json
        import urllib3
        ## It was this:
        http = urllib3.PoolManager()
        ## Changed to this:
        http = urllib3.PoolManager(num_pools=3000)

Line 131, webdriver\remote\remote_connection.py

def _get_connection_manager(self):

        pool_manager_init_args = {
            'timeout': self._timeout
        }
        if self._ca_certs:
            pool_manager_init_args['cert_reqs'] = 'CERT_REQUIRED'
            pool_manager_init_args['ca_certs'] = self._ca_certs

        ## It was this:
        return urllib3.PoolManager(**pool_manager_init_args) if not self._proxy_url else \
            urllib3.ProxyManager(self._proxy_url, **pool_manager_init_args)
        ## Changed to this:
        return urllib3.PoolManager(**pool_manager_init_args, num_pools=30000) if not self._proxy_url else \
            urllib3.ProxyManager(self._proxy_url, **pool_manager_init_args, num_pools=30000)

However, changing both didn't help, and I still have this problem. Tried downgrading the selenium, and it didn't help. It sounds like this problem exists since add_cdp_listener method has been added.

ManiMozaffar commented 2 years ago

I have fixed the problem by changing the urllib3 source code, I'm still looking forward to a solution from selenium, I could only see 2 connection which I described above

PePinodemrs commented 2 years ago

J'ai résolu le problème en changeant le code source urllib3, j'attends toujours avec impatience une solution de sélénium, je n'ai pu voir que 2 connexions que j'ai décrites ci-dessus

Can you please help me with that ?

avasilkov commented 2 years ago

I've encountered the same warning after enabling cdp listener.

I think the problem might be in the fact that UC cdp events handler runs asyncio loop and executes driver.get_log from the driver. It causes selenium to execute a request to the chrome driver through remote connection, which uses PoolManager / ConnectionPool from urllib3. And when the asyncio event handler fires the call to get the logs, if during this moment selenium was executing a call of its own, we now need two connections to the remote. But in the docs for urllib3 Connection pool it says that by default it has maxsize=1. So, it creates a new connection, executes the call but warns the user. The fix will be to not use current reactor implementation or increase maxsize for connection pools in the selenium somehow.

I haven't figured out how to do it, and instead I just poll driver.get_log('performance') myself when I need to access the info. It's not concurrent though. Also need to disable the reactor, otherwise it will be sucking up all the events by running get_log After the driver start with cdp events enabled, run

driver.reactor.event.set()                                                
driver.reactor = None

num_pools increases the number of pools in the pool manager whereas we only need more connections not pools. Maybe somebody could try passing maxsize=2 or 10 to those PoolManager and see if it works.

MarcoMobilio commented 1 year ago

I have fixed the problem by changing the urllib3 source code, I'm still looking forward to a solution from selenium, I could only see 2 connection which I described above

I encountered the same issue and resolved using the same workaround, however this is not a long term solution. Do you know if there is a related issue open Selenium? I tried to look for it, but could not find it, it seems strange that nobody else is experiencing it.

pedro-peixot0 commented 1 year ago

Hey guys after reading what you wrote I thought of another solution, can you try it out and see if it also solves the issue for you? I just copy pasted what I wrote in my issue

When enable_cdp_events is set to True, a Reactor object is created, and with it, the object's listen function is called, starting a loop that invokes the driver.get_log function. This function uses a PoolManager / ConnectionPool object from urllib3 to monitor the network. By default, these objects handle a maximum of 1 connection, which is already being used by the aforementioned loop.

At some point, when we call the driver.quit() function to close the webdriver (I haven't tried to find where), it seems that this same object is accessed, surpassing the connection limit, thus throwing those errors.

The solution is quite simple; we just need to stop the loop started by the listen function from the Reactor object before calling the driver.quit() function. It can be done like this:

import undetected_chromedriver as uc
import time

driver = uc.Chrome(
    enable_cdp_events=True,
    headless=True
)

print(f"is reactor loop closed? {driver.reactor.loop.is_closed()}")
# >>> is reactor loop closed? False

while not driver.reactor.loop.is_closed():
    try:
        driver.reactor.loop.close()
    except:
        driver.reactor.event.set()
        time.sleep(0.5)

print(f"is reactor loop closed? {driver.reactor.loop.is_closed()}")
# >>> is reactor loop closed? True

driver.quit()
Avnsx commented 1 year ago

can you try it out and see if it also solves the issue for you?

Actually this does work, but what is the point of closing the reactor loop like this?

For example, I want to spawn lots of chrome sessions which should all have cdp events enabled, so that I can keep controlling the network for each of the chrome instances.

Once closing the reactor loop, cdp events for that chrome session will also be disabled, at that point you might aswell not enable cdp events at all.

In my specific use case this seems very pointless, since I wanted to driver.set_network_conditions(offline=True, latency=0, download_throughput=0*0, upload_throughput=0*0) but once the reactor loop is disabled, these network conditions are no longer working and my previous chromedriver sessions are casually connecting to the internet.

pedro-peixot0 commented 1 year ago

Actually this does work, but what is the point of closing the reactor loop like this?

@Avnsx You should only close the reactor loop right before quitting the driver. The errors only happen when you call driver.quit()

Avnsx commented 1 year ago

@Avnsx You should only close the reactor loop right before quitting the driver. The errors only happen when you call driver.quit()

I do not want to quit the driver, I want to keep the driver and the cdp listeners up and running so that I can limit the internet connection speed through driver.set_network_conditions(offline=True, latency=0, download_throughput=0*0, upload_throughput=0*0), any ideas how to achieve this?

pedro-peixot0 commented 1 year ago

@Avnsx Could you please provide a sample code that reproduces the errors you are experiencing? I am asking that because I only receive the errors mentioned on this issue right after I close the driver. Maybe you are experiencing a different issue from mine. Can you send me a code so I can reproduce the issue and debug it?

kschroeder commented 1 year ago

I don't know if I necessarily call this a solution, but it got me to the same endpoint of using CDP events while also being able to make CDP requests to the chromedriver instance. My solution was to remove enable_cdp_events, not use driver.add_cdp_listener, relying, instead, on BIDI-based listeners, and the the session that webdriver provided..

        async with self.webdriver.bidi_connection() as connection:
            self.session, self.devtools = connection.session, connection.devtools
            await self.session.execute(self.devtools.network.enable())
            request_received = self.session.listen(self.devtools.network.RequestWillBeSent)

            async with trio.open_nursery() as nursery:
                nursery.start_soon(self.request, request_received)
                return await callback()

Then, elsewhere in my code, if I need to get response body, for example, I call:

body = await self.session.execute(self.devtools.network.get_response_body(request_id))

I've run it a few times here and it fails when using enable_cdp_events, but works when I use the BIDI listener.