omkarcloud / botasaurus

The All in One Framework to build Awesome Scrapers.
https://www.omkar.cloud/botasaurus/
MIT License
1.15k stars 103 forks source link

Not working on some websites #35

Closed bigcharl closed 5 months ago

bigcharl commented 5 months ago

Hello, saw this on a post from undetected-chromedriver and decided to check it out, but couldn't bypass a certain website and I imagine some others with the same tech would also have the same issue.

bet365.com doesn't load the main page and other pages are super inconsistent, works 1/20 times, so it does work just need to find what pattern makes it consistent. The code I've tried and had success is the one from the example:

from botasaurus import *
from botasaurus.create_stealth_driver import create_stealth_driver

@browser(
    create_driver=create_stealth_driver(
        start_url="https://www.bet365.com/#/AC/B151/C1/D50/E3/F163/",
        wait=8, # it seems like the wait doesn't matter
    ),
)
def scrape_heading_task(driver: AntiDetectDriver, data):
    driver.prompt()
    heading = driver.text('h1')
    return heading

scrape_heading_task()

btw this website is only accessible via undetected-chromedriver when using a workaround via disconnecting and reconnecting to the driver so I imagine on botasaurus would be something similar

Chetan11-dev commented 5 months ago

Seems to be working when connecting from Great Britain using IPRoyal Residential Proxy:

from botasaurus import *
from botasaurus.create_stealth_driver import *

@browser(
    user_agent=bt.UserAgent.REAL, 
    window_size=bt.WindowSize.REAL,
    proxy="http://username:password_country-gb@geo.iproyal.com:12321",
    create_driver=create_stealth_driver(
        start_url="https://www.bet365.com/#/AC/B151/C1/D50/E3/F163/",
        wait=12,
    ),
)
def scrape_heading_task(driver: AntiDetectDriver, data):
    driver.prompt()
    heading = driver.title
    print(heading)
    return heading

scrape_heading_task()

Also, Kindly don't do anything that is unethical and against the TOS of bet365.com.