ultrafunkamsterdam / undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
https://github.com/UltrafunkAmsterdam/undetected-chromedriver
GNU General Public License v3.0
9.54k stars 1.14k forks source link

intercepting & blocking certain requests #1727

Open danibarahona opened 7 months ago

danibarahona commented 7 months ago

I'm currently trying to speed up the load of a certain webpage. I thought of scanning the process with my browser, identifying the requests that take the most to load, and then using UC to intercept & block those requests. My code is somewhat similar to this:

def request_filter(req):
    BLOCKED_RESOURCES = ['image', 'jpeg', 'xhr', 'x-icon']
    r_type = req['params']['type'].lower()
    r_url = req['params']['request']['url']

    if r_type in BLOCKED_RESOURCES: # block every request of the types above
        return {"cancel": True}
    if "very.heavy.resource" in r_url: # block the requests that go to 'very.heavy.resource'
        return {"cancel": True}

    print(req) # let the request pass

driver = uc.Chrome(enable_cdp_events=True)

driver.add_cdp_listener("Network.requestWillBeSent", request_filter)

driver.get("target.website.com")

However, I'm having trouble blocking some resources, like JS scripts and the like. I wanted to ask if anyone has a clearer mind on how UC deals with intercepting, inspecting & blocking requests. For example, I'm not quite sure the way to block a request is to say return {'cancel': True}, I just saw it on ChatGPT

max32002 commented 7 months ago
NETWORK_BLOCKED_URLS.append('*.woff')
NETWORK_BLOCKED_URLS.append('*.woff2')
NETWORK_BLOCKED_URLS.append('*.ttf')
NETWORK_BLOCKED_URLS.append('*.otf')
NETWORK_BLOCKED_URLS.append('*.ico')
driver.execute_cdp_cmd('Network.setBlockedURLs', {"urls": NETWORK_BLOCKED_URLS})
driver.execute_cdp_cmd('Network.enable', {})
shehackedyou commented 6 months ago

Very cool concept. I have rewritten this library in Go to experiment with it; and I really like this concept. In general a proxy wrapping around the 'crawler' transparently to not just block but to function as a cache, and potentially a way to have another layer to play the cat-n-mouse game.