Closed andrewyimingchen closed 3 months ago
This is on the to-do list! I'll let you know when I get a fix for it. Have you been able to scrape from Capology in the past?
@andrewyimingchen can you try running this with the latest version? It's working for me.
Hi again,
I'm able to scrape_payrolls, but not scrape_salaries, the following is my code and error message.
scraper = sfc.Capology() try:
Scrape the table
except:
Catch and print any exceptions. This allows us to still close the
finally:
It's important to close the scraper when you're done with it. Otherwise,
Traceback (most recent call last): File "/var/folders/t0/_dnb2pb97vd4xlm44j304kv40000gn/T/ipykernel_98718/1318303924.py", line 5, in
lg_table_2023 = scraper.scrape_salaries(year=2023, league='EPL', currency='eur')
File "/Users/andrewchen/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/ScraperFC/Capology.py", line 110, in scrape_salaries
tbody_html = self.driver.find_element(By.ID, 'table')\
File "/Users/andrewchen/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 831, in find_element
return self.execute(Command.FIND_ELEMENT, {"using": by, "value": value})["value"]
File "/Users/andrewchen/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 440, in execute
self.error_handler.check_response(response)
File "/Users/andrewchen/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 245, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from unknown error: cannot determine loading status
from tab crashed
(Session info: headless chrome=112.0.5615.121)
Stacktrace:
0 chromedriver 0x0000000102621670 chromedriver + 4298352
1 chromedriver 0x0000000102619bbc chromedriver + 4266940
2 chromedriver 0x000000010224c5dc chromedriver + 280028
3 chromedriver 0x000000010223665c chromedriver + 190044
4 chromedriver 0x0000000102235374 chromedriver + 185204
5 chromedriver 0x00000001022357e0 chromedriver + 186336
6 chromedriver 0x0000000102243190 chromedriver + 242064
7 chromedriver 0x00000001022c01d4 chromedriver + 754132
8 chromedriver 0x000000010227a2d0 chromedriver + 467664
9 chromedriver 0x000000010227b354 chromedriver + 471892
10 chromedriver 0x00000001025e16c4 chromedriver + 4036292
11 chromedriver 0x00000001025e5c64 chromedriver + 4054116
12 chromedriver 0x00000001025ec2d8 chromedriver + 4080344
13 chromedriver 0x00000001025e6970 chromedriver + 4057456
14 chromedriver 0x00000001025bd8dc chromedriver + 3889372
15 chromedriver 0x000000010260525c chromedriver + 4182620
16 chromedriver 0x00000001026053b4 chromedriver + 4182964
17 chromedriver 0x00000001026140f4 chromedriver + 4243700
18 libsystem_pthread.dylib 0x00000001ac49ffa8 _pthread_start + 148
19 libsystem_pthread.dylib 0x00000001ac49ada0 thread_start + 8
InvalidSessionIdException Traceback (most recent call last) Cell In[8], line 13 9 traceback.print_exc() 10 finally: 11 # It's important to close the scraper when you're done with it. Otherwise, 12 # you'll have a bunch of webdrivers open and running in the background. ---> 13 scraper.close()
File ~/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/ScraperFC/Capology.py:53, in Capology.close(self) 50 def close(self): 51 """ Closes and quits the Selenium WebDriver instance. 52 """ ---> 53 self.driver.close() 54 self.driver.quit()
File ~/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py:551, in WebDriver.close(self) 543 def close(self) -> None: 544 """Closes the current window. 545 546 :Usage: (...) 549 driver.close() 550 """ --> 551 self.execute(Command.CLOSE)
File ~/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py:440, in WebDriver.execute(self, driver_command, params) 438 response = self.command_executor.execute(driver_command, params) 439 if response: --> 440 self.error_handler.check_response(response) 441 response["value"] = self._unwrap_value(response.get("value", None)) 442 return response
File ~/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py:245, in ErrorHandler.check_response(self, response) 243 alert_text = value["alert"].get("text") 244 raise exception_class(message, screen, stacktrace, alert_text) # type: ignore[call-arg] # mypy is not smart enough here --> 245 raise exception_class(message, screen, stacktrace)
InvalidSessionIdException: Message: invalid session id Stacktrace: 0 chromedriver 0x0000000102621670 chromedriver + 4298352 1 chromedriver 0x0000000102619bbc chromedriver + 4266940 2 chromedriver 0x000000010224c5dc chromedriver + 280028 3 chromedriver 0x0000000102279f3c chromedriver + 466748 4 chromedriver 0x000000010227b354 chromedriver + 471892 5 chromedriver 0x00000001025e16c4 chromedriver + 4036292 6 chromedriver 0x00000001025e5c64 chromedriver + 4054116 7 chromedriver 0x00000001025ec2d8 chromedriver + 4080344 8 chromedriver 0x00000001025e6970 chromedriver + 4057456 9 chromedriver 0x00000001025bd8dc chromedriver + 3889372 10 chromedriver 0x000000010260525c chromedriver + 4182620 11 chromedriver 0x00000001026053b4 chromedriver + 4182964 12 chromedriver 0x00000001026140f4 chromedriver + 4243700 13 libsystem_pthread.dylib 0x00000001ac49ffa8 _pthread_start + 148 14 libsystem_pthread.dylib 0x00000001ac49ada0 thread_start + 8