Open realdronos opened 2 years ago
No it's not :
--start-maximized
though because it looks less suspicious to have a maximized viewport... unfortunately you spelled it wrong.EDIT :
My test code, it scrapped those 1000 items without a hitch :
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
import undetected_chromedriver as uc
if __name__ == "__main__":
options = uc.ChromeOptions()
options.headless = True # you're lucky headless works for this site... for now
driver = uc.Chrome(options=options)
wait = WebDriverWait(driver, 5)
url = "https://spb.vseinstrumenti.ru/instrument/shurupoverty/akkumulyatornye-dreli/"
x_items = '//div[@class="listing-grid"][1]//div[contains(@class, "product-tile grid-item")]'
x_item_infos = '//div[@class="column-right"]'
x_item_available = './/ul[contains(@class, "product-delivery")]/li[1]/span/span[1]'
x_item_href = './/div[@class="image"]/a'
x_item_name = './/div[@class="title"]'
no_page = 1
page = ""
while True:
driver.get(f"{url}{page}")
try:
items = wait.until(EC.presence_of_all_elements_located((By.XPATH, x_items)))
except TimeoutException:
break
for i, (href, name) in enumerate(
[
(
item.find_element(By.XPATH, x_item_href).get_attribute("href"),
item.find_element(By.XPATH, x_item_name).text,
)
for item in items
]
):
driver.get(href)
infos = wait.until(EC.presence_of_element_located((By.XPATH, x_item_infos)))
available = infos.find_elements(By.XPATH, x_item_available)
available = available[0].text if available else "not available"
print(f"{no_page}-{i} - {name} - {available}")
no_page += 1
page = f"page{no_page}/"
driver.quit()
Thanks for xpath hints, but still white sceenshot with headless "True" mode when starts on server. Even no those prints "print(f"{no_page}-{i} - {name} - {available}")" If headless "False" get error:
Traceback (most recent call last):
File "/root/venv/bin/parse/_Parse_Vseinstrumenti_test.py", line 10, in <module>
driver = uc.Chrome(options=options)
File "/usr/local/lib/python3.8/site-packages/undetected_chromedriver/__init__.py", line 401, in __init__
super(Chrome, self).__init__(
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 70, in __init__
super(WebDriver, self).__init__(DesiredCapabilities.CHROME['browserName'], "goog",
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/chromium/webdriver.py", line 93, in __init__
RemoteWebDriver.__init__(
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 269, in __init__
self.start_session(capabilities, browser_profile)
File "/usr/local/lib/python3.8/site-packages/undetected_chromedriver/__init__.py", line 589, in start_session
super(selenium.webdriver.chrome.webdriver.WebDriver, self).start_session(
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 360, in start_session
response = self.execute(Command.NEW_SESSION, parameters)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 425, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: cannot connect to chrome at 127.0.0.1:38899
from chrome not reachable
Your Chrome is not reachable, this has nothing to do with this script. You should update your Chrome... Google bumped Chrome major version to 103.
As i wrote, versions on server:
undetected-chromedriver - 3.1.5.post4 selenium - 4.1.3 Google Chrome - 103.0.5060.53 chromedriver - https://chromedriver.storage.googleapis.com/index.html?path=103.0.5060.53/
Another website works fine
Your env is broken... Are you sure you used the exact same script as provided ? This site seems barely protected : I just scrapped again those 1000 items and I'm still not detected. And by the way I edited my code to use more efficient explicit waits.
EDIT : So I know this is not an issue with UC... please close this issue, it has nothing to do with UC.
Hello, there is a server on ubuntu with python and UC + google chrome. Until the last browser update, the script on the server was working fine. After the update the script began to give an error:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div[6]/div[3]/div/div/div/div/main/div/div[2]/div[3]/div/div/div/ul/li[3]"} (Session info: headless chrome=103.0.5060.53)
Put headless False and took screenshot - white screen. I suspect that the site began to detect UC. As a solution I want to downgrade Chrome to a previous version, so the question is where to find previous versions of uUC and how to install correct version?
Current versions: undetected-chromedriver - 3.1.5.post4 selenium - 4.1.3 Google Chrome - 103.0.5060.53 chromedriver - https://chromedriver.storage.googleapis.com/index.html?path=103.0.5060.53/
ps. Script on the computer starts and runs without any problems.
Code for test: