ultrafunkamsterdam / undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
https://github.com/UltrafunkAmsterdam/undetected-chromedriver
GNU General Public License v3.0
9.75k stars 1.15k forks source link

Chrome is stuck while hitting the Imperva Protected website. #1546

Open saitharun08 opened 1 year ago

saitharun08 commented 1 year ago

Overview

Using selenium-wire

URL = 'https://example.com/home/'

def get_chrome_driver(proxy_info): chrome_options = Options() chrome_options.add_argument('--disable-popup-blocking') chrome_options.add_argument('--no-sandbox') chrome_options.add_argument('--disable-dev-shm-usage') chrome_options.add_argument('--headless') chrome_options.add_argument('--disable-gpu') chrome_options.add_argument("--disk-cache-size=200000000") # limiting the cache storage to 200 MB options = { 'proxy': { 'http': 'http://%s' % proxy_info, 'https': 'http://%s' % proxy_info, 'no_proxy': 'localhost,127.0.0.1' } } driver = Chrome(seleniumwire_options=options, options=chrome_options) return driver

def get_case_details(chrome_driver): chrome_driver.get(URL) accept_button = chrome_driver.find_element(By.ID, 'ContentPlaceHolder1_ButtonAccept') accept_button.click() wait = WebDriverWait(chrome_driver, 40) case_number_field = wait.until(EC.element_to_be_clickable((By.ID, 'ContentPlaceHolder1_TextBoxCaseNumber'))) case_number_field.send_keys('54515') search_button = chrome_driver.find_element(By.ID, 'ContentPlaceHolder1_ButtonSearch') search_button.click() case_number_select = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '.standardRow a'))) case_number_select.click() chrome_driver.refresh() chrome_driver.quit()

proxy_dict = {"proxy_ip": "000.00.0.00", "proxy_port": "0000", "proxy_username": "", "proxy_password": ""} proxy_info = '%s:%s@%s:%s' % (proxy_dict['proxy_username'], proxy_dict['proxy_password'], proxy_dict['proxy_ip'], proxy_dict['proxy_port']) chrome_driver = get_chrome_driver(proxy_info) get_case_details(chrome_driver)

- Despite successfully authenticating the proxy, Chrome still encountered an issue and failed to receive a response from the website with the error: ERR_CERT_AUTHORITY_INVALID. This problem stemmed from the Selenium-wire CA certificate not functioning correctly.
- To address this issue, I included the following code while creating the Chrome instance. After implementing this, Chrome was able to access the home page successfully.
`chrome_options.add_argument('--ignore-certificate-errors')`
- When Chrome attempted to access the next page, it experienced a delay for a few minutes before eventually returning an ERR_HTTP2_PROTOCOL_ERROR. To resolve this issue, I disabled HTTP/2 using the following code.
`chrome_options.add_argument('--disable-http2')`
- Although Chrome could access the home page with the given option, it encountered a 502 Bad Gateway error when trying to retrieve the next page.

**Using undetected-chrome**

- I switched to using undetected-chromedriver directly due to issues with the selenium-wire package. To authenticate the proxy, I utilized uc.ChromeOptions. Please take a look at the code snippet below.

import undetected_chromedriver as uc

def get_chrome_driver(proxy_dict): chrome_options = uc.ChromeOptions() chrome_options.add_argument('--disable-popup-blocking') chrome_options.add_argument('--no-sandbox') chrome_options.add_argument('--disable-dev-shm-usage') chrome_options.add_argument('--headless') chrome_options.add_argument('--disable-gpu') chrome_options.add_argument("--disk-cache-size=200000000") # limiting the cache storage to 200 MB

authentiacting the proxy

chrome_options.add_argument(f"--proxy-server={proxy_dict['proxy_ip']}:{proxy_dict['proxy_port']}")
chrome_options.add_argument(f"--proxy-auth={proxy_dict['proxy_username']}:{proxy_dict['proxy_password']}")
chrome_options.add_argument("--proxy-bypass-list=<-loopback>")
driver = uc.Chrome(options=chrome_options)
return driver

- When I used the code I mentioned earlier, Chrome could access the home page without any certificate errors
- Even with the code in place, Chrome still encountered an issue when trying to access the search page. It got stuck for a few minutes and eventually displayed an ERR_HTTP2_PROTOCOL_ERROR.

---

### **Investigation**

- I carefully examined the code used by undetected-chromedriver, and in both of the earlier issues, it seemed to get stuck at a specific line: 'self._sock.recv_into(b)' in the 'client.py' file of Python's base package. This line is responsible for fetching content from Chrome, including responses from the website
- This specific part of the code experiences a hang-up only when we visit the website protected by. It behaves as expected when accessing other websites or clicking on different pages.
- And there is the possibility that website can keep the browser in Infinite Loops: JavaScript code on a website may contain infinite loops or code that never completes, causing the browser to hang. I was not able to find this kind of code on website and most of the JS was encrypted which was not readable

---

### **Enviroment Used**

I have used the docker image to perform all the operations, please refer the below specifications used

- Image:- ubuntu: 22.10
- chrome version: 116.0.5845.110
- requests==2.31.0
- selenium==4.11.2
- selenium-wire==5.1.0
- undetected_chromedriver==3.5.3
jdholtz commented 1 year ago

Just to note, I did fix the invalid certificate error between selenium-wire and UC in #1503

saitharun08 commented 1 year ago

Just to note, I did fix the invalid certificate error between selenium-wire and UC in #1503

Yes, it did resolve the invalid certificate error. However, Chrome continues to encounter issues when trying to retrieve the next page after initially landing on the home page.

Thank you for the response.

fynn3003 commented 8 months ago

I also conquer the ERR_HTTP2_PROTOCOL_ERROR, @saitharun08 have you found a solution?