Closed Messe57 closed 2 weeks ago
Could you try with caching disabled everywhere?
schedule = ws.read_schedule(force_cache=False)
I am also intrigued why you added
ws._driver.get("https://www.whoscored.com/")
ws._driver.execute_script("location = 'https://whoscored.com/'")
Do you experience any problems without doing this?
I tried what you suggested, but still not working unfortunately.
I added
ws._driver.get("https://www.whoscored.com/")
ws._driver.execute_script("location = 'https://whoscored.com/'")
because the driver was opening with my native language and so it was an issue. I found out this solution in the issues observed before and it works perfectly until now.
Hi all,
same kind of issue here.
Code:
import soccerdata as sd
ws = sd.WhoScored(leagues = ['ITA-Serie A'], seasons = ['2122'])
ws.read_schedule()
and traceback:
Traceback (most recent call last)
Cell In[8], line 1
----> 1 ws.read_schedule()
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\soccerdata\whoscored.py:344](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/soccerdata/whoscored.py#line=343), in WhoScored.read_schedule(self, force_cache)
331 def read_schedule(self, force_cache: bool = False) -> pd.DataFrame:
332 """Retrieve the game schedule for the selected leagues and seasons.
333
334 Parameters
(...)
342 pd.DataFrame
343 """
--> 344 df_season_stages = self.read_season_stages(force_cache=force_cache)
345 filemask_schedule = "matches/{}_{}_{}_{}.json"
347 all_schedules = []
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\soccerdata\whoscored.py:274](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/soccerdata/whoscored.py#line=273), in WhoScored.read_season_stages(self, force_cache)
261 def read_season_stages(self, force_cache: bool = False) -> pd.DataFrame:
262 """Retrieve the season stages for the selected leagues.
263
264 Parameters
(...)
272 pd.DataFrame
273 """
--> 274 df_seasons = self.read_seasons()
275 filemask = "seasons/{}_{}.html"
277 season_stages = []
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\soccerdata\whoscored.py:225](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/soccerdata/whoscored.py#line=224), in WhoScored.read_seasons(self)
218 def read_seasons(self) -> pd.DataFrame:
219 """Retrieve the selected seasons for the selected leagues.
220
221 Returns
222 -------
223 pd.DataFrame
224 """
--> 225 df_leagues = self.read_leagues()
227 seasons = []
228 for lkey, league in df_leagues.iterrows():
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\soccerdata\whoscored.py:210](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/soccerdata/whoscored.py#line=209), in WhoScored.read_leagues(self)
199 for league in region["tournaments"]:
200 leagues.append(
201 {
202 "region_id": region["id"],
(...)
206 }
207 )
209 return (
--> 210 pd.DataFrame(leagues)
211 .assign(league=lambda x: x.region + " - " + x.league)
212 .pipe(self._translate_league)
213 .set_index("league")
214 .loc[self._selected_leagues.keys()]
215 .sort_index()
216 )
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexing.py:1191](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/pandas/core/indexing.py#line=1190), in _LocationIndexer.__getitem__(self, key)
1189 maybe_callable = com.apply_if_callable(key, self.obj)
1190 maybe_callable = self._check_deprecated_callable_usage(key, maybe_callable)
-> 1191 return self._getitem_axis(maybe_callable, axis=axis)
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexing.py:1420](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/pandas/core/indexing.py#line=1419), in _LocIndexer._getitem_axis(self, key, axis)
1417 if hasattr(key, "ndim") and key.ndim > 1:
1418 raise ValueError("Cannot index with multidimensional key")
-> 1420 return self._getitem_iterable(key, axis=axis)
1422 # nested tuple slicing
1423 if is_nested_tuple(key, labels):
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexing.py:1360](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/pandas/core/indexing.py#line=1359), in _LocIndexer._getitem_iterable(self, key, axis)
1357 self._validate_key(key, axis)
1359 # A collection of keys
-> 1360 keyarr, indexer = self._get_listlike_indexer(key, axis)
1361 return self.obj._reindex_with_indexers(
1362 {axis: [keyarr, indexer]}, copy=True, allow_dups=True
1363 )
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexing.py:1558](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/pandas/core/indexing.py#line=1557), in _LocIndexer._get_listlike_indexer(self, key, axis)
1555 ax = self.obj._get_axis(axis)
1556 axis_name = self.obj._get_axis_name(axis)
-> 1558 keyarr, indexer = ax._get_indexer_strict(key, axis_name)
1560 return keyarr, indexer
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexes\base.py:6200](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/pandas/core/indexes/base.py#line=6199), in Index._get_indexer_strict(self, key, axis_name)
6197 else:
6198 keyarr, indexer, new_indexer = self._reindex_non_unique(keyarr)
-> 6200 self._raise_if_missing(keyarr, indexer, axis_name)
6202 keyarr = self.take(indexer)
6203 if isinstance(key, Index):
6204 # GH 42790 - Preserve name from an Index
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexes\base.py:6249](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/pandas/core/indexes/base.py#line=6248), in Index._raise_if_missing(self, key, indexer, axis_name)
6247 if nmissing:
6248 if nmissing == len(indexer):
-> 6249 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
6251 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
6252 raise KeyError(f"{not_found} not in index")
KeyError: "None of [Index(['ITA-Serie A'], dtype='object', name='league')] are in the [index]"
Traceback (most recent call last)
Cell In[8], line 1
----> 1 ws.read_schedule()
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\soccerdata\whoscored.py:344](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/soccerdata/whoscored.py#line=343), in WhoScored.read_schedule(self, force_cache)
331 def read_schedule(self, force_cache: bool = False) -> pd.DataFrame:
332 """Retrieve the game schedule for the selected leagues and seasons.
333
334 Parameters
(...)
342 pd.DataFrame
343 """
--> 344 df_season_stages = self.read_season_stages(force_cache=force_cache)
345 filemask_schedule = "matches/{}_{}_{}_{}.json"
347 all_schedules = []
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\soccerdata\whoscored.py:274](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/soccerdata/whoscored.py#line=273), in WhoScored.read_season_stages(self, force_cache)
261 def read_season_stages(self, force_cache: bool = False) -> pd.DataFrame:
262 """Retrieve the season stages for the selected leagues.
263
264 Parameters
(...)
272 pd.DataFrame
273 """
--> 274 df_seasons = self.read_seasons()
275 filemask = "seasons/{}_{}.html"
277 season_stages = []
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\soccerdata\whoscored.py:225](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/soccerdata/whoscored.py#line=224), in WhoScored.read_seasons(self)
218 def read_seasons(self) -> pd.DataFrame:
219 """Retrieve the selected seasons for the selected leagues.
220
221 Returns
222 -------
223 pd.DataFrame
224 """
--> 225 df_leagues = self.read_leagues()
227 seasons = []
228 for lkey, league in df_leagues.iterrows():
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\soccerdata\whoscored.py:210](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/soccerdata/whoscored.py#line=209), in WhoScored.read_leagues(self)
199 for league in region["tournaments"]:
200 leagues.append(
201 {
202 "region_id": region["id"],
(...)
206 }
207 )
209 return (
--> 210 pd.DataFrame(leagues)
211 .assign(league=lambda x: x.region + " - " + x.league)
212 .pipe(self._translate_league)
213 .set_index("league")
214 .loc[self._selected_leagues.keys()]
215 .sort_index()
216 )
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexing.py:1191](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/pandas/core/indexing.py#line=1190), in _LocationIndexer.__getitem__(self, key)
1189 maybe_callable = com.apply_if_callable(key, self.obj)
1190 maybe_callable = self._check_deprecated_callable_usage(key, maybe_callable)
-> 1191 return self._getitem_axis(maybe_callable, axis=axis)
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexing.py:1420](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/pandas/core/indexing.py#line=1419), in _LocIndexer._getitem_axis(self, key, axis)
1417 if hasattr(key, "ndim") and key.ndim > 1:
1418 raise ValueError("Cannot index with multidimensional key")
-> 1420 return self._getitem_iterable(key, axis=axis)
1422 # nested tuple slicing
1423 if is_nested_tuple(key, labels):
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexing.py:1360](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/pandas/core/indexing.py#line=1359), in _LocIndexer._getitem_iterable(self, key, axis)
1357 self._validate_key(key, axis)
1359 # A collection of keys
-> 1360 keyarr, indexer = self._get_listlike_indexer(key, axis)
1361 return self.obj._reindex_with_indexers(
1362 {axis: [keyarr, indexer]}, copy=True, allow_dups=True
1363 )
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexing.py:1558](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/pandas/core/indexing.py#line=1557), in _LocIndexer._get_listlike_indexer(self, key, axis)
1555 ax = self.obj._get_axis(axis)
1556 axis_name = self.obj._get_axis_name(axis)
-> 1558 keyarr, indexer = ax._get_indexer_strict(key, axis_name)
1560 return keyarr, indexer
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexes\base.py:6200](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/pandas/core/indexes/base.py#line=6199), in Index._get_indexer_strict(self, key, axis_name)
6197 else:
6198 keyarr, indexer, new_indexer = self._reindex_non_unique(keyarr)
-> 6200 self._raise_if_missing(keyarr, indexer, axis_name)
6202 keyarr = self.take(indexer)
6203 if isinstance(key, Index):
6204 # GH 42790 - Preserve name from an Index
File [~\AppData\Local\Programs\Python\Python312\Lib\site-packages\pandas\core\indexes\base.py:6249](http://localhost:8888/~/AppData/Local/Programs/Python/Python312/Lib/site-packages/pandas/core/indexes/base.py#line=6248), in Index._raise_if_missing(self, key, indexer, axis_name)
6247 if nmissing:
6248 if nmissing == len(indexer):
-> 6249 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
6251 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
6252 raise KeyError(f"{not_found} not in index")
KeyError: "None of [Index(['ITA-Serie A'], dtype='object', name='league')] are in the [index]"
Read other issues at last (apologize for not doing that before), my above issue is likely related to being forced to load the italian version of the website.
Updating my issue... I think it might be a problem with read_schedule function because when I ask for the available leagues the code run perfectly. Furthermore, the chromedriver is able to open the website without any issues. Do you suggest any additional changes to do?
same issue with me too but only for scraping Italian SerieA league
The following works fine for me:
import soccerdata as sd
ws = sd.WhoScored(leagues = ['ITA-Serie A'], seasons = ['2122'], no_cache = True)
ws.read_schedule()
I am closing this since I don't have sufficient information to debug your issue. Feel free to reopen if you can pinpoint the cause.
Since I found out about this project few months ago, scraping data has always been very easy, so I am really thankful to who is currently working on it. However, now I'm stuck with a problem that I am not able to solve, but I hope that someone can help me deal with it. I admit that I am a beginner in coding, so it might be very easy to solve, but not with my knowledge. This is my code:
This is the key error that I am receiving:
Thank you in advance.