probberechts / soccerdata

⛏⚽ Scrape soccer data from Club Elo, ESPN, FBref, FiveThirtyEight, Football-Data.co.uk, FotMob, Sofascore, SoFIFA, Understat and WhoScored.
https://soccerdata.readthedocs.io/en/latest/
Other
598 stars 103 forks source link

[WhoScored] Unable to dismiss cookies banner #65

Closed giochi99 closed 2 years ago

giochi99 commented 2 years ago

Which Python version are you using?

Python 3.10.5

Which version of soccerdata are you using?

1.0.2

What did you do?

ws = sd.WhoScored(leagues="ENG-Premier League", seasons='19-20', proxy='tor')

pl_1920_events = ws.read_events()
pl_1920_events.head()

What did you expect to see?

Downloaded event data

What did you see instead?

TimeoutException                          Traceback (most recent call last)
Input In [13], in <cell line: 1>()
----> 1 pl_1920_events = ws.read_events()
        2 pl_1920_events.head()

File ~/.local/lib/python3.10/site-packages/soccerdata/whoscored.py:552, in WhoScored.read_events(self, match_id, force_cache, live, output_fmt)
     549 urlmask = WHOSCORED_URL + "/Matches/{}/Live"
     550 filemask = "events/{}_{}/{}.json"
--> 552 df_schedule = self.read_schedule(force_cache).reset_index()
     553 if match_id is not None:
     554     iterator = df_schedule[
     555         df_schedule.game_id.isin([match_id] if isinstance(match_id, int) else match_id)
     556     ]

File ~/.local/lib/python3.10/site-packages/soccerdata/whoscored.py:287, in WhoScored.read_schedule(self, force_cache)
    285 time.sleep(random.random() * 5)
    286 self._driver.get(url)
--> 287 stages = self._parse_season_stages()
    288 if len(stages) > 0:
    289     for stage in stages:

File ~/.local/lib/python3.10/site-packages/soccerdata/whoscored.py:182, in WhoScored._parse_season_stages(self)
    178 match_selector = (
    179     "//div[contains(@id,'tournament-fixture')]//div[contains(@class,'divtable-row')]"
    180 )
    181 time.sleep(5 + random.random() * 5)
--> 182 WebDriverWait(self._driver, 30, poll_frequency=1).until(
    183     ec.presence_of_element_located((By.XPATH, match_selector))
    184 )
    185 stages = []
    186 node_stages_selector = "//select[contains(@id,'stages')]/option"

File ~/.local/lib/python3.10/site-packages/selenium/webdriver/support/wait.py:89, in WebDriverWait.until(self, method, message)
     87     if time.monotonic() > end_time:
     88         break
---> 89 raise TimeoutException(message, screen, stacktrace)

TimeoutException: Message: 
Stacktrace:
#0 0x5616b4e39b13 <unknown>
#1 0x5616b4c40688 <unknown>
#2 0x5616b4c77cc7 <unknown>
#3 0x5616b4c77e91 <unknown>
#4 0x5616b4caae34 <unknown>
#5 0x5616b4c958dd <unknown>
#6 0x5616b4ca8b94 <unknown>
#7 0x5616b4c957a3 <unknown>
#8 0x5616b4c6b0ea <unknown>
#9 0x5616b4c6c225 <unknown>
#10 0x5616b4e812dd <unknown>
#11 0x5616b4e852c7 <unknown>
#12 0x5616b4e6b22e <unknown>
#13 0x5616b4e860a8 <unknown>
#14 0x5616b4e5fbc0 <unknown>
#15 0x5616b4ea26c8 <unknown>
#16 0x5616b4ea2848 <unknown>
#17 0x5616b4ebcc0d <unknown>
#18 0x7f669b48c54d <unknown>
probberechts commented 2 years ago

It should be fixed in v1.0.3. Note that you might also have to run the scraper in non-headless mode to avoid bot detection with

ws = sd.WhoScored(leagues="ENG-Premier League", seasons='21-22', proxy='tor', headless=False)