omkarcloud / botasaurus

The All in One Framework to build Awesome Scrapers.
https://www.omkar.cloud/botasaurus/
MIT License
1.16k stars 104 forks source link

not work on VPS #41

Closed Amatefinde closed 5 months ago

Amatefinde commented 5 months ago

when i try to run hello world script on vps (Ubuntu 22)

from botasaurus import *

@browser(headless=True)
def scrape_heading_task(driver: AntiDetectDriver, data):
    # Navigate to the Omkar Cloud website
    driver.get("https://www.omkar.cloud/")

    # Retrieve the heading element's text
    heading = driver.text("h1")

    # Save the data as a JSON file in output/scrape_heading_task.json
    return {"heading": heading}

if __name__ == "__main__":
    # Initiate the web scraping task
    scrape_heading_task()

i obtain this error

(venv) root@1941865-hj59931:~/realm_of_python/sandbox# python botasaurus_collector.py
Running
[INFO] Downloading Chrome Driver. This is a one-time process. Download in progress...
Traceback (most recent call last):
  File "/root/realm_of_python/sandbox/botasaurus_collector.py", line 18, in <module>
    scrape_heading_task()
  File "/root/realm_of_python/sandbox/venv/lib/python3.11/site-packages/botasaurus/decorators.py", line 501, in wrapper_browser
    current_result = run_task(data_item, False, 0)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/realm_of_python/sandbox/venv/lib/python3.11/site-packages/botasaurus/decorators.py", line 399, in run_task
    driver = create_selenium_driver(options, desired_capabilities)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/realm_of_python/sandbox/venv/lib/python3.11/site-packages/botasaurus/create_driver_utils.py", line 221, in create_selenium_driver
    driver = AntiDetectDriver(
             ^^^^^^^^^^^^^^^^^
  File "/root/realm_of_python/sandbox/venv/lib/python3.11/site-packages/botasaurus/anti_detect_driver.py", line 33, in __init__
    super().__init__(*args, **kwargs)
  File "/root/realm_of_python/sandbox/venv/lib/python3.11/site-packages/selenium/webdriver/chrome/webdriver.py", line 69, in __init__
    super().__init__(DesiredCapabilities.CHROME['browserName'], "goog",
  File "/root/realm_of_python/sandbox/venv/lib/python3.11/site-packages/selenium/webdriver/chromium/webdriver.py", line 92, in __init__
    super().__init__(
  File "/root/realm_of_python/sandbox/venv/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py", line 272, in __init__
    self.start_session(capabilities, browser_profile)
  File "/root/realm_of_python/sandbox/venv/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py", line 364, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/realm_of_python/sandbox/venv/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py", line 429, in execute
    self.error_handler.check_response(response)
  File "/root/realm_of_python/sandbox/venv/lib/python3.11/site-packages/selenium/webdriver/remote/errorhandler.py", line 243, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.SessionNotCreatedException: Message: session not created: Chrome failed to start: exited normally.
  (session not created: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /opt/google/chrome/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Stacktrace:
#0 0x55ba42034fb3 <unknown>
#1 0x55ba41d084a7 <unknown>
#2 0x55ba41d3bc93 <unknown>
#3 0x55ba41d3810c <unknown>
#4 0x55ba41d7aac6 <unknown>
#5 0x55ba41d71713 <unknown>
#6 0x55ba41d4418b <unknown>
#7 0x55ba41d44f7e <unknown>
#8 0x55ba41ffa8d8 <unknown>
#9 0x55ba41ffe800 <unknown>
#10 0x55ba42008cfc <unknown>
#11 0x55ba41fff418 <unknown>
#12 0x55ba41fcc42f <unknown>
#13 0x55ba420234e8 <unknown>
#14 0x55ba420236b4 <unknown>
#15 0x55ba42034143 <unknown>
#16 0x7f1c63bcaac3 <unknown>

is there any advices?

Chetan11-dev commented 5 months ago

Could you test:

from botasaurus import *

def add_arguments(data, options):
            options.add_argument('--disable-dev-shm-usage')
            options.add_argument('--no-sandbox')

@browser(headless=True, add_arguments=add_arguments)
def scrape_heading_task(driver: AntiDetectDriver, data):
    # Navigate to the Omkar Cloud website
    driver.get("https://www.omkar.cloud/")

    # Retrieve the heading element's text
    heading = driver.text("h1")

    # Save the data as a JSON file in output/scrape_heading_task.json
    return {"heading": heading}

if __name__ == "__main__":
    # Initiate the web scraping task
    scrape_heading_task()
Amatefinde commented 5 months ago

oh, wow, its work really nice. thank you very much