kaliiiiiiiiii / Selenium-Driverless

undetected Selenium without usage of chromedriver
https://kaliiiiiiiiii.github.io/Selenium-Driverless/
Other
430 stars 52 forks source link

Class with proxy usage issues. #118

Closed juhacz closed 7 months ago

juhacz commented 7 months ago

Hi

I have 100 proxy servers (ipv6) at my disposal and for a long time I have been trying to write a program based on your class that does something like this.

In a loop, it sets up a proxy server, connects to the selected page with the Cloudflate preloader (security check), when the page loads (there is immediately json on it), cookies are downloaded, saved in the database, the same is done for the next proxy, etc. The program seems simple, but after several loop runs (6-10 times) there are various random errors of your class. I use the developer version of the class because it has support for authenticated proxies. It would be nice if you could check where the problem is. Below is the program (or rather the logic of the program)


target_url = 'https://www.xxxx.com/api/1.0.0/products/?file.json'

async with (webdriver.Chrome(options=options, max_ws_size=2** 60, debug=True) as driver):
     driver.base_target.socket.on_closed.append(lambda code, reason: driver.quit())
     user_agent = await driver.execute_async_script("return navigator.userAgent;")
     await asyncio.sleep(2)

     while True:
         proxies = proxy_db.get_proxy_list_to_recive_cf_cookie()
         # if there is no cookie to refresh, wait
         if not proxies:
             time.sleep(60 * 4)
             continue

         for proxy in proxies:
             try:
                 ip = proxy["ip"]
                 port = proxy["port"]

                 step = step + 1

                 await driver.set_single_proxy(
                     'http://{username}:{password}@{ip}:{port}'.format(
                         ip=ip,
                         port=port,
                         username=proxy["username"],
                         password=proxy["password"].rstrip(),
                     )
                 )

                 await driver.get(target_url, wait_load=True, timeout=loading_timeout)
                 await asyncio.sleep(5)

                 wait_step = 1
                 found = False

                 while wait_step <= 5 and not found:
                     found = await driver.find_element(By.TAG_NAME, "pre")

                     if found:
                         cookies = await driver.get_cookies()
                         for cookies in cookies:
                             if cookie["name"] == "cf_clearance":
                                 cookie_value = cookie["value"]
                                 # save the downloaded copokie in the database
                                 proxy_db.set_cookie_for_proxy(ip, port, cookie_value, user_agent)
                         break

                     await asyncio.sleep(0.5)
                     wait_step = wait_step + 1

                 await driver.delete_all_cookies()
                 await driver.clear_proxy()

             except NoSuchElementException:
                 errors = errors + 1
                 continue

             finally:
                 if timeouts_errors == 5:
                     driver.quit()
                     exit()

One more question, in the case as described, where is the Cloudflate preloader with security check, what is the easiest way to wait for me to be redirected to the right page and load it?

The only thing that worked for me was this:

                await driver.get(target_url, wait_load=True, timeout=loading_timeout)
                await asyncio.sleep(5)

                wait_step = 1
                found = False

                while wait_step <= 5 and not found:
                    found = await driver.find_element(By.TAG_NAME, "pre")

                    if found:
                        cookies = await driver.get_cookies()
                        for cookie in cookies:
                            if cookie["name"] == "cf_clearance":
                                cookie_value = cookie["value"]
                                # save cookiue to DB
                                proxy_db.set_cookie_for_proxy(ip, port, cookie_value, user_agent)                                
                        break

                    await asyncio.sleep(0.5)
                    wait_step = wait_step + 1

In plain selenium I used:

myElem = WebDriverWait(driver, loading_timeout).until(
                     EC.presence_of_element_located((By.TAG_NAME, 'pre')))

Regarding the errors I receive when the script is running, for example:

Traceback (most recent call last):
  File "V:\DEV\Projects\Python\carshelper\run_driverless_3proxy.py", line 245, in <module>
    asyncio.run(main())
  File "V:\DEV\laragon\bin\python\python-3.10\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "V:\DEV\laragon\bin\python\python-3.10\lib\asyncio\base_events.py", line 646, in run_until_complete
    return future.result()
  File "V:\DEV\Projects\Python\carshelper\run_driverless_3proxy.py", line 215, in main
    found = await driver.find_element(By.TAG_NAME, "pre")
  File "V:\DEV\Projects\Python\carshelper\venv\lib\site-packages\selenium_driverless\webdriver.py", line 884, in find_element
    return await target.find_element(by=by, value=value, parent=parent, timeout=timeout)
  File "V:\DEV\Projects\Python\carshelper\venv\lib\site-packages\selenium_driverless\types\target.py", line 565, in find_element
    return await parent.find_element(by=by, value=value, timeout=timeout)
  File "V:\DEV\Projects\Python\carshelper\venv\lib\site-packages\selenium_driverless\types\webelement.py", line 245, in find_element
    elems = await self.find_elements(by=by, value=value)
  File "V:\DEV\Projects\Python\carshelper\venv\lib\site-packages\selenium_driverless\types\webelement.py", line 275, in find_elements
    return await self.execute_script("return obj.getElementsByTagName(arguments[0])",
  File "V:\DEV\Projects\Python\carshelper\venv\lib\site-packages\selenium_driverless\types\webelement.py", line 757, in execute_script
    return await self.__exec__(script, *args, max_depth=max_depth, serialization=serialization,
  File "V:\DEV\Projects\Python\carshelper\venv\lib\site-packages\selenium_driverless\types\deserialize.py", line 183, in __exec__
    base_obj_id = await self.__obj_id_for_context__(exec_context)
  File "V:\DEV\Projects\Python\carshelper\venv\lib\site-packages\selenium_driverless\types\webelement.py", line 147, in __obj_id_for_context__
    res = await self.__target__.execute_cdp_cmd("DOM.resolveNode", args)
  File "V:\DEV\Projects\Python\carshelper\venv\lib\site-packages\selenium_driverless\types\target.py", line 760, in execute_cdp_cmd
    result = await self.socket.exec(method=cmd, params=cmd_args, timeout=timeout)
  File "V:\DEV\Projects\Python\carshelper\venv\lib\site-packages\cdp_socket\socket.py", line 78, in exec
    raise e
  File "V:\DEV\Projects\Python\carshelper\venv\lib\site-packages\cdp_socket\socket.py", line 72, in exec
    res = await asyncio.wait_for(self._responses[_id], timeout=timeout)
  File "V:\DEV\laragon\bin\python\python-3.10\lib\asyncio\tasks.py", line 445, in wait_for
    return fut.result()
cdp_socket.exceptions.CDPError: {'code': -32000, 'message': 'No node with given id found'}

Thank you, for your help.

kaliiiiiiiiii commented 7 months ago

@juhacz Does

found = await driver.find_element(By.TAG_NAME, "pre", timeout=5)

work for you? Generally, the error occurs because the page is loading while searching for the elemen. So kind-o. a racing condition