ultrafunkamsterdam / undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
https://github.com/UltrafunkAmsterdam/undetected-chromedriver
GNU General Public License v3.0
9.98k stars 1.16k forks source link

[nodriver] browser.update_targets() does not remove stale targets #1981

Open epicwhale opened 3 months ago

epicwhale commented 3 months ago

Browser targets could often be destroyed without emitting the cdp.target.TargetDestroyed event which nodriver seems to rely on to update the list of browser.targets. For example: when a browser context is disposed using Target.disposeBrowserContext.

Looking at the code in browser.update_targets, it does not appear to remove any targets that are no longer available in the browser. It only updates existing targets and adds new ones.

It would be make sense to remove stale targets that are present in self.targets but no longer exist in the browser.

A simple addition like below in update_targets() should keep the targets list better in sync with the browser.

    async def update_targets(self):
        ...
        targets = await self._get_targets()
        ...

        for t in targets:
            ...

        ### PROPOSED ADDITION ###
        # Prune stale targets
        self.targets = [t for t in self.targets if t.target_id in [t.target_id for t in targets]]

        ...
        await asyncio.sleep(0)
epicwhale commented 3 months ago

On closer inspection, I think the root problem here is that the self.targets list starts going out of sync with Chrome, whenever the browser session socket is closed @ https://github.com/ultrafunkamsterdam/nodriver/blob/2957c87f3319eeb05814d3169160a17cc2bff408/nodriver/core/connection.py#L416 (the close is triggered whenever a CDP command issued on the browser socket fails for me).

But when the browser connection socket is reopened later it does not re-register for the Target.setDiscoverTargets (which is done on the first browser socket connection - https://github.com/ultrafunkamsterdam/nodriver/blob/2957c87f3319eeb05814d3169160a17cc2bff408/nodriver/core/browser.py#L393 )

epicwhale commented 3 months ago

@ultrafunkamsterdam May I ask what is the rational behind closing the socket session whenever a ProtocolException is raised in the send(..) function? I believe CDP sockets can be used even after a CDP command returns an error, as long as the network connection is intact? What's the benefit of prematurely closely on the first command failing?