unclecode / crawl4ai

🔥🕷️ Crawl4AI: Open-source LLM Friendly Web Crawler & Scrapper
Apache License 2.0
1.5k stars 146 forks source link

AttributeError: 'NoneType' object has no attribute 'split' #48

Closed vyokky closed 2 days ago

vyokky commented 3 weeks ago

from crawl4ai import WebCrawler

Create an instance of WebCrawler

crawler = WebCrawler(verbose=False)


AttributeError Traceback (most recent call last) Input In [2], in <cell line: 4>() 1 from crawl4ai import WebCrawler 3 # Create an instance of WebCrawler ----> 4 crawler = WebCrawler(verbose=False)

File ~\dev\lib\site-packages\crawl4ai\web_crawler.py:27, in WebCrawler.init(self, crawler_strategy, always_by_pass_cache, verbose) 19 def init( 20 self, 21 # db_path: str = None, (...) 25 ): 26 # self.db_path = db_path ---> 27 self.crawler_strategy = crawler_strategy or LocalSeleniumCrawlerStrategy(verbose=verbose) 28 self.always_by_pass_cache = always_by_pass_cache 30 # Create the .crawl4ai folder in the user's home directory if it doesn't exist

File ~\dev\lib\site-packages\crawl4ai\crawler_strategy.py:139, in LocalSeleniumCrawlerStrategy.init(self, use_cached_html, js_code, **kwargs) 122 self.hooks = { 123 'on_driver_created': None, 124 'on_user_agent_updated': None, (...) 127 'before_return_html': None 128 } 130 # chromedriver_autoinstaller.install() 131 # import chromedriver_autoinstaller 132 # crawl4ai_folder = os.path.join(Path.home(), ".crawl4ai") (...) 135 # chromedriver_path = chromedriver_autoinstaller.utils.download_chromedriver() 136 # self.service = Service(chromedriver_autoinstaller.install()) --> 139 chromedriver_path = ChromeDriverManager().install() 140 self.service = Service(chromedriver_path) 141 self.service.log_path = "NUL"

File ~\dev\lib\site-packages\webdriver_manager\chrome.py:40, in ChromeDriverManager.install(self) 39 def install(self) -> str: ---> 40 driver_path = self._get_driver_binary_path(self.driver) 41 os.chmod(driver_path, 0o755) 42 return driver_path

File ~\dev\lib\site-packages\webdriver_manager\core\manager.py:40, in DriverManager._get_driver_binary_path(self, driver) 37 return binary_path 39 os_type = self.get_os_type() ---> 40 file = self._download_manager.download_file(driver.get_driver_download_url(os_type)) 41 binary_path = self._cache_manager.save_file_to_cache(driver, file) 42 return binary_path

File ~\dev\lib\site-packages\webdriver_manager\drivers\chrome.py:32, in ChromeDriver.get_driver_download_url(self, os_type) 31 def get_driver_download_url(self, os_type): ---> 32 driver_version_to_download = self.get_driver_version_to_download() 33 # For Mac ARM CPUs after version 106.0.5249.61 the format of OS type changed 34 # to more unified "mac_arm64". For newer versions, it'll be "mac_arm64" 35 # by default, for lower versions we replace "mac_arm64" to old format - "mac64_m1". 36 if version.parse(driver_version_to_download) < version.parse("106.0.5249.61"):

File ~\dev\lib\site-packages\webdriver_manager\core\driver.py:48, in Driver.get_driver_version_to_download(self) 45 if self._driver_version_to_download: 46 return self._driver_version_to_download ---> 48 return self.get_latest_release_version()

File ~\dev\lib\site-packages\webdriver_manager\drivers\chrome.py:64, in ChromeDriver.get_latest_release_version(self) 62 return determined_browser_version 63 # Remove the build version (the last segment) from determined_browser_version for version < 113 ---> 64 determined_browser_version = ".".join(determined_browser_version.split(".")[:3]) 65 latest_release_url = ( 66 self._latest_release_url 67 if (determined_browser_version is None) 68 else f"{self._latest_releaseurl}{determined_browser_version}" 69 ) 70 resp = self._http_client.get(url=latest_release_url)

AttributeError: 'NoneType' object has no attribute 'split'

unclecode commented 3 weeks ago

@vyokky Please provide additional information: 1/ Do you utilize virtual environments, Docker, or conda environments? 2/ On your Mac, establish a new virtual environment and test the following:

pip install --upgrade webdriver-manager

3/ Proceed to execute this code snippet to check for the chromedriver path:

from webdriver_manager.chrome import ChromeDriverManager
chromedriver_path = ChromeDriverManager().install()
print(chromedriver_path)

Finally share the result with me please.

Chi-chicken commented 2 weeks ago

"Hi, I'm using a Python virtual environment (tested on Python 3.10 and 3.9) to employ the crawl4ai, but I'm encountering the same issue. Even after upgrading my webdriver-manager, the problem persists. Please help me resolve this. Thank you very much!"

unclecode commented 1 week ago

@Chi-chicken May I know which operating system you are using? Linux?

andshen-github commented 1 week ago

I'm using ubuntu 22.04 and got same error

andshen-github commented 1 week ago

@unclecode Can you help ?

unclecode commented 1 week ago

@andshen-github Please check this issue https://github.com/unclecode/crawl4ai/issues/54 Here there is explanation on how install it on linux.