Open JimKarvo opened 4 months ago
I tried out your code in my ubuntu system. works fine for me. If no luck probably try this out
from botasaurus.browser import browser, Driver
import time
@browser(add_arguments=['--no-sandbox'])
def scrape_heading_task(driver: Driver, data):
driver.google_get("https://gitlab.com/users/sign_in")
time.sleep(2)
iframe = driver.select_iframe("#turnstile-wrapper iframe")
checkbox = iframe.select('label', None)
if checkbox:
checkbox.click()
driver.prompt()
driver.save_screenshot()
heading = driver.get_text("h1")
return heading
# Initiate the web scraping task
scrape_heading_task()
If necessary you might have to use proxies to access the site.
Still not working at ubuntu server (no gui).
I have the same IP as my windows machine. At Windows the script working without any problems.
At linux i tryied this:
from botasaurus.browser import browser, Driver
import time
@browser(add_arguments=['--no-sandbox'])
def scrape_heading_task(driver: Driver, data):
driver.google_get("https://gitlab.com/users/sign_in")
time.sleep(10)
iframe = driver.select_iframe("#turnstile-wrapper iframe")
driver.save_screenshot()
checkbox = iframe.select('label', None)
if checkbox:
print("detected checkbox")
checkbox.click()
time.sleep(1)
driver.save_screenshot()
driver.prompt()
driver.save_screenshot()
heading = driver.get_text("h1")
return heading
# Initiate the web scraping task
scrape_heading_task()
Seems that the checkbox isn't clicked (at second screenshot). If I increase the timeout from 10 to 30, the turntile disappeared!
The CF seems that can detect the Botosaurus. There is no IP banned, there is no OS related problem. I have the same behavior on windows 11 and on ubuntu server.
If i emit the "wait" parameter, i get different error (like the "id" not found)
The script:
the log: