oseymour / ScraperFC

Python package for scraping soccer data from a variety of sources
GNU General Public License v3.0
222 stars 49 forks source link

Capology function scrape_salaries return error #24

Closed andrewyimingchen closed 3 months ago

andrewyimingchen commented 1 year ago

Hi again,

I'm able to scrape_payrolls, but not scrape_salaries, the following is my code and error message.

scraper = sfc.Capology() try:

Scrape the table

lg_table_2023 = scraper.scrape_salaries(year=2023, league='EPL', currency='eur')

except:

Catch and print any exceptions. This allows us to still close the

# scraper below, even if an exception occurs.
traceback.print_exc()

finally:

It's important to close the scraper when you're done with it. Otherwise,

# you'll have a bunch of webdrivers open and running in the background.
scraper.close()

Traceback (most recent call last): File "/var/folders/t0/_dnb2pb97vd4xlm44j304kv40000gn/T/ipykernel_98718/1318303924.py", line 5, in lg_table_2023 = scraper.scrape_salaries(year=2023, league='EPL', currency='eur') File "/Users/andrewchen/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/ScraperFC/Capology.py", line 110, in scrape_salaries tbody_html = self.driver.find_element(By.ID, 'table')\ File "/Users/andrewchen/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 831, in find_element return self.execute(Command.FIND_ELEMENT, {"using": by, "value": value})["value"] File "/Users/andrewchen/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 440, in execute self.error_handler.check_response(response) File "/Users/andrewchen/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 245, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash from unknown error: cannot determine loading status from tab crashed (Session info: headless chrome=112.0.5615.121) Stacktrace: 0 chromedriver 0x0000000102621670 chromedriver + 4298352 1 chromedriver 0x0000000102619bbc chromedriver + 4266940 2 chromedriver 0x000000010224c5dc chromedriver + 280028 3 chromedriver 0x000000010223665c chromedriver + 190044 4 chromedriver 0x0000000102235374 chromedriver + 185204 5 chromedriver 0x00000001022357e0 chromedriver + 186336 6 chromedriver 0x0000000102243190 chromedriver + 242064 7 chromedriver 0x00000001022c01d4 chromedriver + 754132 8 chromedriver 0x000000010227a2d0 chromedriver + 467664 9 chromedriver 0x000000010227b354 chromedriver + 471892 10 chromedriver 0x00000001025e16c4 chromedriver + 4036292 11 chromedriver 0x00000001025e5c64 chromedriver + 4054116 12 chromedriver 0x00000001025ec2d8 chromedriver + 4080344 13 chromedriver 0x00000001025e6970 chromedriver + 4057456 14 chromedriver 0x00000001025bd8dc chromedriver + 3889372 15 chromedriver 0x000000010260525c chromedriver + 4182620 16 chromedriver 0x00000001026053b4 chromedriver + 4182964 17 chromedriver 0x00000001026140f4 chromedriver + 4243700 18 libsystem_pthread.dylib 0x00000001ac49ffa8 _pthread_start + 148 19 libsystem_pthread.dylib 0x00000001ac49ada0 thread_start + 8


InvalidSessionIdException Traceback (most recent call last) Cell In[8], line 13 9 traceback.print_exc() 10 finally: 11 # It's important to close the scraper when you're done with it. Otherwise, 12 # you'll have a bunch of webdrivers open and running in the background. ---> 13 scraper.close()

File ~/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/ScraperFC/Capology.py:53, in Capology.close(self) 50 def close(self): 51 """ Closes and quits the Selenium WebDriver instance. 52 """ ---> 53 self.driver.close() 54 self.driver.quit()

File ~/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py:551, in WebDriver.close(self) 543 def close(self) -> None: 544 """Closes the current window. 545 546 :Usage: (...) 549 driver.close() 550 """ --> 551 self.execute(Command.CLOSE)

File ~/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py:440, in WebDriver.execute(self, driver_command, params) 438 response = self.command_executor.execute(driver_command, params) 439 if response: --> 440 self.error_handler.check_response(response) 441 response["value"] = self._unwrap_value(response.get("value", None)) 442 return response

File ~/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py:245, in ErrorHandler.check_response(self, response) 243 alert_text = value["alert"].get("text") 244 raise exception_class(message, screen, stacktrace, alert_text) # type: ignore[call-arg] # mypy is not smart enough here --> 245 raise exception_class(message, screen, stacktrace)

InvalidSessionIdException: Message: invalid session id Stacktrace: 0 chromedriver 0x0000000102621670 chromedriver + 4298352 1 chromedriver 0x0000000102619bbc chromedriver + 4266940 2 chromedriver 0x000000010224c5dc chromedriver + 280028 3 chromedriver 0x0000000102279f3c chromedriver + 466748 4 chromedriver 0x000000010227b354 chromedriver + 471892 5 chromedriver 0x00000001025e16c4 chromedriver + 4036292 6 chromedriver 0x00000001025e5c64 chromedriver + 4054116 7 chromedriver 0x00000001025ec2d8 chromedriver + 4080344 8 chromedriver 0x00000001025e6970 chromedriver + 4057456 9 chromedriver 0x00000001025bd8dc chromedriver + 3889372 10 chromedriver 0x000000010260525c chromedriver + 4182620 11 chromedriver 0x00000001026053b4 chromedriver + 4182964 12 chromedriver 0x00000001026140f4 chromedriver + 4243700 13 libsystem_pthread.dylib 0x00000001ac49ffa8 _pthread_start + 148 14 libsystem_pthread.dylib 0x00000001ac49ada0 thread_start + 8

oseymour commented 1 year ago

This is on the to-do list! I'll let you know when I get a fix for it. Have you been able to scrape from Capology in the past?

oseymour commented 10 months ago

@andrewyimingchen can you try running this with the latest version? It's working for me.