omkarcloud / botasaurus

The All in One Framework to build Awesome Scrapers.
https://www.omkar.cloud/botasaurus/
MIT License
1.29k stars 121 forks source link

NameError: name 'AntiDetectDriver' is not defined #155

Open yuke2002 opened 1 month ago

yuke2002 commented 1 month ago

NameError Traceback (most recent call last) in <cell line: 3>() 2 3 @browser(cache=True) ----> 4 def scrape_heading_task(driver: AntiDetectDriver, link): 5 driver.get(link) 6 heading = driver.get_text("h1") NameError: name 'AntiDetectDriver' is not defined

JimKarvo commented 1 month ago

Please post your code

yuke2002 commented 1 month ago

from botasaurus.browser import browser, Driver from botasaurus import *

@browser() def scrape_places_links(driver: AntiDetectDriver, query):

# Visit Google Maps
def visit_google_maps():
    query = "restaurants in delhi"
    encoded_query = urllib.parse.quote_plus(query)
    url = f'https://www.google.com/maps/search/{encoded_query}'
    driver.get(url)

    # Accept Cookies for European users
    if driver.is_in_page("https://consent.google.com/"):
        agree_button_selector = 'form:nth-child(2) > div > div > button'
        driver.click(agree_button_selector)
        driver.google_get(url)

visit_google_maps()
ZainZ01 commented 1 month ago

I am having the same error, the code is as below:

from botasaurus import *

@browser
def get_details(driver: AntiDetectDriver, data, sections = True):
            driver.get(my_url)
return my_results

The error is as below:

in def get_details(driver: AntiDetectDriver, data, sections = True):

NameError: name 'AntiDetectDriver' is not defined

Halone228 commented 1 month ago

Which version of botasaurus you use? try pip freeze to see version

TimidBee commented 1 month ago

I've got the same issue. Here's the output of pip freeze

anyio==4.4.0 attrs==23.2.0 beautifulsoup4==4.12.3 bota==4.0.64 botasaurus==4.0.47 botasaurus-server==4.0.51 botasaurus_api==4.0.4 botasaurus_driver==4.0.55 botasaurus_proxy_authentication==1.0.16 botasaurus_requests==4.0.31 bottle==0.12.25 Brotli==1.1.0 cachetools==5.4.0 casefy==0.1.7 certifi==2024.7.4 cffi==1.16.0 charset-normalizer==3.3.2 click==8.1.7 close_chrome==4.0.40 colorama==0.4.6 Deprecated==1.2.14 et-xmlfile==1.1.0 gevent==24.2.1 geventhttpclient==2.3.1 google-auth==2.32.0 greenlet==3.0.3 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 idna==3.7 javascript_fixes==1.1.29 joblib==1.4.2 kubernetes==30.1.0 lxml==5.2.2 markdown-it-py==3.0.0 mdurl==0.1.2 ndjson==0.3.1 oauthlib==3.2.2 openpyxl==3.1.5 outcome==1.3.0.post0 psutil==6.0.0 pyasn1==0.6.0 pyasn1_modules==0.4.0 pycparser==2.22 Pygments==2.18.0 PySocks==1.7.1 python-dateutil==2.9.0.post0 PyVirtualDisplay==3.0 PyYAML==6.0.1 requests==2.32.3 requests-oauthlib==2.0.0 rich==13.7.1 rsa==4.9 selenium==4.23.0 six==1.16.0 sniffio==1.3.1 sortedcontainers==2.4.0 soupsieve==2.5 SQLAlchemy==2.0.31 trio==0.26.0 trio-websocket==0.11.1 typing_extensions==4.9.0 undetected-chromedriver==3.5.5 Unidecode==1.3.8 urllib3==2.2.2 websocket-client==1.8.0 websockets==12.0 wrapt==1.16.0 wsproto==1.2.0 XlsxWriter==3.2.0 zope.event==5.0 zope.interface==6.4.post2

Halone228 commented 1 month ago

you have the latest version of Botasaurus, where the name was changed from AntiDetectDriver to Driver, etc

TimidBee commented 1 month ago

That explains a lot, thanks!

ZainZ01 commented 1 month ago

@Halone228 > you have the latest version of Botasaurus, where the name was changed from AntiDetectDriver to Driver, etc

Thanks for the reply, that explains it. Does this mean the Driver and AntiDetectDriver are the same now, that is, every Driver is AntiDetectDriver? From what I saw earlier was that Driver did not have some properties that AntiDetectDriver had, for instance, getting element_or_none, etc. Is there a list of changes made in this version available? Thanks :)

ZainZ01 commented 1 month ago

Actually it still doesn't work:

my code:

from botasaurus import *

@browser
def get_details(driver: Driver, data, sections = True):
            driver.get(my_url)
return my_results

the error:

in def get_details(driver: Driver, data, sections = True):

NameError: name 'Driver' is not defined

response of pip freeze:

beautifulsoup4==4.12.3
bleach==6.0.0
bota==4.0.64
botasaurus==4.0.47
botasaurus-api==4.0.4
botasaurus-driver==4.0.55
botasaurus-proxy-authentication==1.0.16
botasaurus-requests==4.0.31
Brotli==1.1.0
bs4==0.0.1
cachetools==5.3.0
Halone228 commented 1 month ago

from README.md

from botasaurus.browser import browser, Driver

@browser
def scrape_heading_task(driver: Driver, data):
    # Visit the Omkar Cloud website
    driver.get("https://www.omkar.cloud/")

    # Retrieve the heading element's text
    heading = driver.get_text("h1")

    # Save the data as a JSON file in output/scrape_heading_task.json
    return {
        "heading": heading
    }

# Initiate the web scraping task
scrape_heading_task()

you need import Driver from botasaurus.browser For more infomation please read the docs again

ZainZ01 commented 1 month ago

botasaurus.browser.Driver is not the same as AntiDetectDriver as per my understanding of the following code ([https://github.com/omkarcloud/botasaurus/blob/master/docs/docs/google-maps-scraping-tutorial.md]):

@browser(
    data=["restaurants in delhi"],
)
def scrape_places_links(driver: AntiDetectDriver, query):

    # Visit Google Maps
    def visit_google_maps():
        encoded_query = urllib.parse.quote_plus(query)
        url = f'https://www.google.com/maps/search/{encoded_query}'
        driver.get(url)

        # Accept Cookies for European users
        if driver.is_in_page("https://consent.google.com/"):
            agree_button_selector = 'form:nth-child(2) > div > div > button'
            driver.click(agree_button_selector)
            driver.google_get(url)

    visit_google_maps()

# Visit an individual place and extract data
            def scrape_place_data():
                driver.get(link)

                # Accept Cookies for European users
                if driver.is_in_page("https://consent.google.com/"):
                        agree_button_selector = 'form:nth-child(2) > div > div > button'
                        driver.click(agree_button_selector)
                        driver.get(link)

                # Extract title
                title_selector = 'h1'
                title = driver.text(title_selector)

                # Extract rating
                rating_selector = "div.F7nice > span"
                rating = driver.text(rating_selector)

                # Extract reviews count
                reviews_selector = "div.F7nice > span:last-child"
                reviews_text = driver.text(reviews_selector)
                reviews = int(''.join(filter(str.isdigit, reviews_text))) if reviews_text else None

                # Extract website link
                website_selector = "a[data-item-id='authority']"
                website = driver.link(website_selector)

                # Extract phone number
                phone_xpath = "//button[starts-with(@data-item-id,'phone')]"
                phone_element = driver.get_element_or_none(phone_xpath)
                phone = phone_element.get_attribute("data-item-id").replace("phone:tel:", "") if phone_element else None

                return {
                    "title": title,
                    "phone": phone,
                    "website": website,
                    "reviews": reviews,
                    "rating": rating,
                    "link": link,
                }

AntiDetectDriver has the property driver.get_element_or_none(XPath), while Driver does not.