omkarcloud / botasaurus

The All in One Framework to build Awesome Scrapers.
https://www.omkar.cloud/botasaurus/
MIT License
1.14k stars 103 forks source link

Exception: Failed to connect to Chrome URL: http://127.0.0.1:37491/json/version. #133

Closed JimKarvo closed 3 weeks ago

JimKarvo commented 3 weeks ago

Hello,

I am trying to run ANY example code at linux (Ubuntu server), headless server.

The main problem that i am facing, is that I can't comunicate with browser.

Traceback (most recent call last):
  File "test.py", line 12, in <module>
    print(scrape_heading_task())
  File "/usr/local/lib/python3.8/dist-packages/botasaurus/browser_decorator.py", line 261, in wrapper_browser
    current_result = run_task(data_item, False, 0)
  File "/usr/local/lib/python3.8/dist-packages/botasaurus/browser_decorator.py", line 157, in run_task
    driver = Driver(headless=evaluated_headless, proxy=evaluated_proxy, profile=evaluated_profile, tiny_profile=tiny_profile, block_images=block_images, block_images_and_css=block_images_and_css, wait_for_complete_page_load=wait_for_complete_page_load, extensions=evaluated_extensions, arguments=args, user_agent=evaluated_user_agent, window_size=evaluated_window_size, lang=evaluated_lang, beep=beep)
  File "/usr/local/lib/python3.8/dist-packages/botasaurus_driver/driver.py", line 1026, in __init__
    self._browser: Browser = self._run(start(self.config))
  File "/usr/local/lib/python3.8/dist-packages/botasaurus_driver/core/util.py", line 89, in start
    return Browser.create(config)
  File "/usr/local/lib/python3.8/dist-packages/botasaurus_driver/core/browser.py", line 97, in create
    instance.start()
  File "/usr/local/lib/python3.8/dist-packages/botasaurus_driver/core/browser.py", line 253, in start
    self.info = ensure_chrome_is_alive(chrome_url)
  File "/usr/local/lib/python3.8/dist-packages/botasaurus_driver/core/browser.py", line 51, in ensure_chrome_is_alive
    raise Exception(f"Failed to connect to Chrome URL: {url}.")
Exception: Failed to connect to Chrome URL: http://127.0.0.1:42377/json/version.

The same code at windows pc works like a charm, maybe the botasaurus is not support linux servers?

Chetan11-dev commented 3 weeks ago

Does this works:

from botasaurus.browser import browser, Driver

@browser(add_arguments=['--no-sandbox'])
def scrape_heading_task(driver: Driver, data):
    # Visit the Omkar Cloud website
    driver.get("https://www.omkar.cloud/")

    # Retrieve the heading element's text
    heading = driver.get_text("h1")

    # Save the data as a JSON file in output/scrape_heading_task.json
    return {
        "heading": heading
    }

# Initiate the web scraping task
scrape_heading_task()
JimKarvo commented 3 weeks ago

Hello @Chetan11-dev Tryied on 2 different machines, but the same behaviur

  File "/usr/local/lib/python3.11/dist-packages/botasaurus_driver/core/browser.py", line 51, in ensure_chrome_is_alive
    raise Exception(f"Failed to connect to Chrome URL: {url}.")
Exception: Failed to connect to Chrome URL: http://127.0.0.1:44799/json/version.
JimKarvo commented 3 weeks ago

Sollution found.

At ubuntu servers, we have to install virtual monitor First, you need to install Xvfb.

sudo apt-get update
sudo apt-get install xvfb

Start Xvfb on a specified display. For example, to start it on display :99, run:

Xvfb :99 -screen 0 1024x768x16 &
export DISPLAY=:99

Then the script works!

Chetan11-dev commented 3 weeks ago

Also, if you run script like VM=true python main.py, it will then also work successfully