oseymour / ScraperFC

Python package for scraping soccer data from a variety of sources
GNU General Public License v3.0
222 stars 49 forks source link

Issue with Selenium Driver? #30

Closed aegonwolf closed 9 months ago

aegonwolf commented 10 months ago

Question: Hi there,

could it be that there is an issue related to the selenium driver? calling: scraper = sfc.FBRef() (or any of the scrappers) yields ValueError: There is no such driver by url https://chromedriver.storage.googleapis.com/115.0.5790/chromedriver_win32.zip

which may be solved by this: https://stackoverflow.com/questions/75281458/selenium-chromedrivermanager-doesnt-downloads-the-latest-version-of-chromedrive

Full traceback:


ValueError                                Traceback (most recent call last)
Cell In [1], line 4
      1 import ScraperFC as sfc
      2 import traceback
----> 4 scraper = sfc.FBRef() # initialize the FBRef scraper
      5 try:
      6     # scrape the table
      7     lg_table = scraper.scrape_league_table(year=2023, league='EPL')

File ~\anaconda3\envs\soccerdata\lib\site-packages\ScraperFC\FBRef.py:44, in FBRef.__init__(self)
     41         options.add_experimental_option('prefs', prefs)
     42         options.add_argument('--log-level=3')
     43         self.driver = webdriver.Chrome(
---> 44             service=ChromeService(ChromeDriverManager().install()),
     45             options=options
     46         )
     47 #         elif driver == 'firefox':
     48 #             from selenium.webdriver.chrome.service import Service as FirefoxService
     49 #             self.driver = webdriver.Firefox(service=FirefoxService(GeckoDriverManager().install()))
     51         self.stats_categories = {
     52             'standard': {'url': 'stats', 'html': 'standard',},
     53             'goalkeeping': {'url': 'keepers', 'html': 'keeper',},
   (...)
     62             'misc': {'url': 'misc', 'html': 'misc',},
     63         }

File ~\anaconda3\envs\soccerdata\lib\site-packages\webdriver_manager\chrome.py:39, in ChromeDriverManager.install(self)
     38 def install(self) -> str:
---> 39     driver_path = self._get_driver_path(self.driver)
     40     os.chmod(driver_path, 0o755)
     41     return driver_path

File ~\anaconda3\envs\soccerdata\lib\site-packages\webdriver_manager\core\manager.py:30, in DriverManager._get_driver_path(self, driver)
     27 if binary_path:
     28     return binary_path
---> 30 file = self._download_manager.download_file(driver.get_url())
     31 binary_path = self.driver_cache.save_file_to_cache(driver, file)
     32 return binary_path

File ~\anaconda3\envs\soccerdata\lib\site-packages\webdriver_manager\core\download_manager.py:28, in WDMDownloadManager.download_file(self, url)
     26 def download_file(self, url: str) -> File:
     27     log(f"About to download new driver from {url}")
---> 28     response = self._http_client.get(url)
     29     return File(response)

File ~\anaconda3\envs\soccerdata\lib\site-packages\webdriver_manager\core\http.py:33, in WDMHttpClient.get(self, url, **kwargs)
     31 def get(self, url, **kwargs) -> Response:
     32     resp = requests.get(url=url, verify=self._ssl_verify, stream=True, **kwargs)
---> 33     self.validate_response(resp)
     34     if wdm_progress_bar():
     35         show_download_progress(resp)

File ~\anaconda3\envs\soccerdata\lib\site-packages\webdriver_manager\core\http.py:16, in HttpClient.validate_response(resp)
     14 status_code = resp.status_code
     15 if status_code == 404:
---> 16     raise ValueError(f"There is no such driver by url {resp.url}")
     17 elif status_code == 401:
     18     raise ValueError(f"API Rate limit exceeded. You have to add GH_TOKEN!!!")

ValueError: There is no such driver by url https://chromedriver.storage.googleapis.com/115.0.5790/chromedriver_win32.zip```

Many thanks for this great tool!
oseymour commented 10 months ago

Hey @aegonwolf! This has been a tricky one to debug. I've never gotten this error, but I know a lot of people are getting it right now. I'm running the latest versions of webdriver-manager and Selenium.

I think the issue is an issue with webdriver-manager and something that changed with the chromedrivers after Chrome v115. To fix it, either specify the version to match your Chrome version like some of the solutions in the link you included. Or try the newest version of ScraperFC, v2.8.0. I actually found out that Selenium will automatically download and manage the chromedrivers so ScraperFC no longer uses webdriver-manager.

Let me know if v2.8.0 fixes this.

aegonwolf commented 10 months ago

Hey @aegonwolf! This has been a tricky one to debug. I've never gotten this error, but I know a lot of people are getting it right now. I'm running the latest versions of webdriver-manager and Selenium.

I think the issue is an issue with webdriver-manager and something that changed with the chromedrivers after Chrome v115. To fix it, either specify the version to match your Chrome version like some of the solutions in the link you included. Or try the newest version of ScraperFC, v2.8.0. I actually found out that Selenium will automatically download and manage the chromedrivers so ScraperFC no longer uses webdriver-manager.

Let me know if v2.8.0 fixes this.

Hey, thanks a lot.

One of the first things I've tried is updating it to the latest version but I still got the error.

I'll try a new clean environment and install everything from scratch

aegonwolf commented 10 months ago

Hmm, I still get this with a brand new environment unfortunately. So did not fix it. @oseymour

oseymour commented 10 months ago

@aegonwolf can you try running ChromeDriverManager().install() in a cell?

aegonwolf commented 10 months ago

@aegonwolf can you try running ChromeDriverManager().install() in a cell?

will try that and report back! Sorry for late reply

oseymour commented 10 months ago

No worries! Can you also try running this code when you get the chance?

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument('--incognito')
driver = webdriver.Chrome(options=options)
aegonwolf commented 10 months ago

@oseymour Sure thing, so for ChromeDriverManager().install() this doesn't work, I pip installed ChromeDriverManager but I can't import it, how do I import it?

For the other code I get:


SessionNotCreatedException Traceback (most recent call last) Cell In [7], line 6 4 options = Options() 5 options.add_argument('--incognito') ----> 6 driver = webdriver.Chrome(options=options)

File ~\anaconda3\envs\soccerdata\lib\site-packages\selenium\webdriver\chrome\webdriver.py:69, in WebDriver.init(self, executable_path, port, options, service_args, desired_capabilities, service_log_path, chrome_options, service, keep_alive) 66 if not service: 67 service = Service(executable_path, port, service_args, service_log_path) ---> 69 super().init(DesiredCapabilities.CHROME['browserName'], "goog", 70 port, options, 71 service_args, desired_capabilities, 72 service_log_path, service, keep_alive)

File ~\anaconda3\envs\soccerdata\lib\site-packages\selenium\webdriver\chromium\webdriver.py:92, in ChromiumDriver.init(self, browser_name, vendor_prefix, port, options, service_args, desired_capabilities, service_log_path, service, keep_alive) 89 self.service.start() 91 try: ---> 92 super().init( 93 command_executor=ChromiumRemoteConnection( 94 remote_server_addr=self.service.service_url, 95 browser_name=browser_name, vendor_prefix=vendor_prefix, 96 keep_alive=keep_alive, ignore_proxy=_ignore_proxy), 97 options=options) 98 except Exception: 99 self.quit()

File ~\anaconda3\envs\soccerdata\lib\site-packages\selenium\webdriver\remote\webdriver.py:272, in WebDriver.init(self, command_executor, desired_capabilities, browser_profile, proxy, keep_alive, file_detector, options) 270 self._authenticator_id = None 271 self.start_client() --> 272 self.start_session(capabilities, browser_profile)

File ~\anaconda3\envs\soccerdata\lib\site-packages\selenium\webdriver\remote\webdriver.py:364, in WebDriver.start_session(self, capabilities, browser_profile) 362 w3c_caps = _make_w3c_caps(capabilities) 363 parameters = {"capabilities": w3c_caps} --> 364 response = self.execute(Command.NEW_SESSION, parameters) 365 if 'sessionId' not in response: 366 response = response['value']

File ~\anaconda3\envs\soccerdata\lib\site-packages\selenium\webdriver\remote\webdriver.py:429, in WebDriver.execute(self, driver_command, params) 427 response = self.command_executor.execute(driver_command, params) 428 if response: --> 429 self.error_handler.check_response(response) 430 response['value'] = self._unwrap_value( 431 response.get('value', None)) 432 return response

File ~\anaconda3\envs\soccerdata\lib\site-packages\selenium\webdriver\remote\errorhandler.py:243, in ErrorHandler.check_response(self, response) 241 alert_text = value['alert'].get('text') 242 raise exception_class(message, screen, stacktrace, alert_text) # type: ignore[call-arg] # mypy is not smart enough here --> 243 raise exception_class(message, screen, stacktrace)

SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 97 Current browser version is 116.0.5845.140 with binary path C:\Program Files (x86)\Google\Chrome\Application\chrome.exe Stacktrace: Backtrace: Ordinal0 [0x010DFDC3+2555331] Ordinal0 [0x010777F1+2127857] Ordinal0 [0x00F72E08+1060360] Ordinal0 [0x00F919CA+1186250] Ordinal0 [0x00F8D825+1169445] Ordinal0 [0x00F8AFC1+1159105] Ordinal0 [0x00FBC22F+1360431] Ordinal0 [0x00FBBE9A+1359514] Ordinal0 [0x00FB7976+1341814] Ordinal0 [0x00F936B6+1193654] Ordinal0 [0x00F94546+1197382] GetHandleVerifier [0x01279622+1619522] GetHandleVerifier [0x0132882C+2336844] GetHandleVerifier [0x011723E1+541697] GetHandleVerifier [0x01171443+537699] Ordinal0 [0x0107D18E+2150798] Ordinal0 [0x01081518+2168088] Ordinal0 [0x01081660+2168416] Ordinal0 [0x0108B330+2208560] BaseThreadInitThunk [0x762400C9+25] RtlGetAppContainerNamedObjectPath [0x77887B1E+286] RtlGetAppContainerNamedObjectPath [0x77887AEE+238]

oseymour commented 10 months ago

To import ChromeDriverManager: from webdriver_manager.chrome import ChromeDriverManager.

Yeah it's an issue with Selenium. Beyond updating Selenium I don't know who to fix it. I've never gotten this error. Have you tried looking at the Selenium issues or stackoverflowing the error above?

aegonwolf commented 10 months ago

yeah, though, I've reinstalled everything in a new environment it should be the most up to date.

Thanks a lot, I'll try to continue and debug, I sure will find the issue. Maybe I try it outside of conda

aegonwolf commented 10 months ago

This ChromeDriveManager().install() gives

ValueError                                Traceback (most recent call last)
Cell In [11], line 2
      1 # import ChromeDriverManager
----> 2 ChromeDriverManager().install()

File ~\anaconda3\envs\soccerdata\lib\site-packages\webdriver_manager\chrome.py:39, in ChromeDriverManager.install(self)
     38 def install(self) -> str:
---> 39     driver_path = self._get_driver_path(self.driver)
     40     os.chmod(driver_path, 0o755)
     41     return driver_path

File ~\anaconda3\envs\soccerdata\lib\site-packages\webdriver_manager\core\manager.py:30, in DriverManager._get_driver_path(self, driver)
     27 if binary_path:
     28     return binary_path
---> 30 file = self._download_manager.download_file(driver.get_url())
     31 binary_path = self.driver_cache.save_file_to_cache(driver, file)
     32 return binary_path

File ~\anaconda3\envs\soccerdata\lib\site-packages\webdriver_manager\core\download_manager.py:28, in WDMDownloadManager.download_file(self, url)
     26 def download_file(self, url: str) -> File:
     27     log(f"About to download new driver from {url}")
---> 28     response = self._http_client.get(url)
     29     return File(response)

File ~\anaconda3\envs\soccerdata\lib\site-packages\webdriver_manager\core\http.py:33, in WDMHttpClient.get(self, url, **kwargs)
     31 def get(self, url, **kwargs) -> Response:
     32     resp = requests.get(url=url, verify=self._ssl_verify, stream=True, **kwargs)
---> 33     self.validate_response(resp)
     34     if wdm_progress_bar():
     35         show_download_progress(resp)

File ~\anaconda3\envs\soccerdata\lib\site-packages\webdriver_manager\core\http.py:16, in HttpClient.validate_response(resp)
     14 status_code = resp.status_code
     15 if status_code == 404:
---> 16     raise ValueError(f"There is no such driver by url {resp.url}")
     17 elif status_code == 401:
     18     raise ValueError(f"API Rate limit exceeded. You have to add GH_TOKEN!!!")

ValueError: There is no such driver by url https://chromedriver.storage.googleapis.com/116.0.5845/chromedriver_win32.zip

Btw

oseymour commented 10 months ago

yeah, though, I've reinstalled everything in a new environment it should be the most up to date.

Thanks a lot, I'll try to continue and debug, I sure will find the issue. Maybe I try it outside of conda

I do use pip install. It shouldn't make a difference with a popular package like Selenium but you never know....

aegonwolf commented 10 months ago

After trying lots of crazy things, I think the issue is that my chrome installation is too new? Apologies if that's a super silly question, I now tried on desktop and laptop (win 10/11) and had the same warnings, would you be able to provide a requirements.txt of your setup or env file?

oseymour commented 10 months ago

Here's the yaml export file from an env that I just made. I used default Anaconda environment and then pip installed ipykernel (to run Jupyter notebook) and Selenium.

I was able to run the code from this cell and open a ChromeDriver window without issues.

My Chrome version is 116.0.5845.141.

aegonwolf commented 9 months ago

Here's the yaml export file from an env that I just made. I used default Anaconda environment and then pip installed ipykernel (to run Jupyter notebook) and Selenium.

I was able to run the code from this cell and open a ChromeDriver window without issues.

My Chrome version is 116.0.5845.141.

Thanks a lot! Hmm, I have the same chrome version. I'll let you know if I can get it to run.

aegonwolf commented 9 months ago

Thanks a lot for the help, the environment didn't do the trick, but I eventually manually replaced the driver from here: https://googlechromelabs.github.io/chrome-for-testing/ with the one in the error message and this worked :-) So happy haha. Thanks for the patience and this fantastic package

oseymour commented 9 months ago

Glad you found a fix! I'm still frustrated that using the newest version of Selenium doesn't work for you though. It should just handle the ChromeDriver version "out-of-the-box".