Open VaradDeshmukh97 opened 7 months ago
Hey @VaradDeshmukh97, great to see you around here again! đ Your adventures in the cloud sound like quite the saga. I'm on it - will get back to you with a strategy to conquer this error shortly! đŠī¸
When I run this,
from llama_index.readers.web import WholeSiteReader
from selenium import webdriver
def load_websites(prefix, base_url, max_depth=1):
'''
Recursively searches and loads a website into LlamaIndex Document objects.
Args:
prefix = url of the website
base_url = url of the page from where to start the scraping
'''
driver = webdriver.Chrome()
scraper = WholeSiteReader(
prefix=prefix,
max_depth=max_depth,
driver=driver
)
websites = scraper.load_data(
base_url=base_url
)
return websites
I get the following error:
SessionNotCreatedException: Message: session not created: Chrome failed to start: exited normally.
(session not created: DevToolsActivePort file doesn't exist)
(The process started from chrome location /teamspace/studios/this_studio/.cache/selenium/chrome/linux64/123.0.6312.122/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Stacktrace:
#0 0x55b5d0b0f863 <unknown>
#1 0x55b5d08058c6 <unknown>
#2 0x55b5d0839d34 <unknown>
#3 0x55b5d0835d3d <unknown>
#4 0x55b5d087eaed <unknown>
#5 0x55b5d0872343 <unknown>
#6 0x55b5d0843593 <unknown>
#7 0x55b5d0843f5e <unknown>
#8 0x55b5d0ad384b <unknown>
#9 0x55b5d0ad77a5 <unknown>
#10 0x55b5d0ac1571 <unknown>
#11 0x55b5d0ad8332 <unknown>
#12 0x55b5d0aa687f <unknown>
#13 0x55b5d0afe728 <unknown>
#14 0x55b5d0afe8fb <unknown>
#15 0x55b5d0b0e9b4 <unknown>
#16 0x7ff2fb9b8609 start_thread
The error you're encountering suggests a compatibility issue between Chrome and ChromeDriver or a problem with locating the Chrome executable in your cloud environment. Here's how to address these issues:
Check Chrome and ChromeDriver Compatibility: Ensure the versions of Chrome and ChromeDriver you're using are compatible. Mismatches can lead to the SessionNotCreatedException
.
Use Remote WebDriver: In cloud environments, it's common to use a Remote WebDriver setup. This involves configuring your code to use webdriver.Remote()
instead of webdriver.Chrome()
, specifying the URL of the Selenium Grid or cloud-based testing service and the desired capabilities.
Specify Chrome Binary Location: If the Chrome binary is in a non-standard location, you may need to explicitly set its path using ChromeOptions
. This is done by setting opt.binary_location = '/path/to/chrome'
before initializing the WebDriver.
Here's a snippet for using a Remote WebDriver, which you should adjust based on your environment:
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
def setup_driver(self):
selenium_grid_url = "http://your.selenium.grid.url:4444/wd/hub"
capabilities = DesiredCapabilities.CHROME.copy()
opt = webdriver.ChromeOptions()
# Specify binary location if necessary
# opt.binary_location = '/path/to/chrome'
capabilities.update(opt.to_capabilities())
return webdriver.Remote(command_executor=selenium_grid_url, desired_capabilities=capabilities)
Replace "http://your.selenium.grid.url:4444/wd/hub"
with your Selenium Grid hub URL or the endpoint of your cloud-based Selenium service. This setup requires the remote environment to have the correct Chrome and ChromeDriver versions installed. If you're still facing issues, please share more details about your cloud setup, including the versions of Chrome and ChromeDriver and the location of the Chrome executable.
Hey @VaradDeshmukh97 did you found any solution or workaround to this issue? I am encountering the same issue and it would be amazing if I could use your help!
Hi @suryaanshrai , Got the same errors, this solution works for me You can edit the function and add these parameters. def setup_driver(self): """ Sets up the Selenium WebDriver for Chrome.
Returns:
WebDriver: An instance of Chrome WebDriver.
"""
try:
import chromedriver_autoinstaller
except ImportError:
raise ImportError("Please install chromedriver_autoinstaller")
opt = webdriver.ChromeOptions()
opt.add_argument("--start-maximized")
opt.add_argument('--headless')
opt.add_argument('--no-sandbox')
opt.add_argument('--disable-dev-shm-usage')**
chromedriver_autoinstaller.install()
return webdriver.Chrome(options=opt)
Question Validation
Question
I am using WholeSiteReader() for loading websites into LlamaIndex Document objects. On my local system it works fine, but when I run my notebook on the Cloud, it says
ValueError: No chrome executable found on PATH
I understand that the driver needs to be given the path to the Chrome Executable, but I am unable to work it out. Any help will be appreciated. Thanks!