probberechts / soccerdata

⛏⚽ Scrape soccer data from Club Elo, ESPN, FBref, FiveThirtyEight, Football-Data.co.uk, FotMob, Sofascore, SoFIFA, Understat and WhoScored.
https://soccerdata.readthedocs.io/en/latest/
Other
529 stars 90 forks source link

In-depth tutorial on how to add new leagues? #104

Closed cj0121 closed 11 months ago

cj0121 commented 1 year ago

Hi,

This is an amazing package! I think the docs are mostly very clear. However, is it possible to have a more in-depth tutorial on how to add new leagues to FBRef? I'm trying to add the English Championship, which is available on FB Ref, but wasn't able to. I added a league_dict.json file (with the correct config I assume) to the "SOCCERDATA_DIR/config/" file path, but it seems like the code is not picking up on it when I call fbref = sd.FBref(leagues="EFL Championship", seasons=2019). It gave me a ValueError noting "Invalid League". Thank you so much!

philbywalsh commented 1 year ago

Hi @cj0121 - try pasting in here the entry which you made in the league_dict.json file. Perhaps there's a syntax error?

probberechts commented 1 year ago

I think it is indeed a good idea to extend the documentation for adding additional leagues. Multiple people seem to be struggling with that.

Now, to resolve your problem, you should:

  1. Make sure to reload the soccerdata module after you modify the league_dict.json file. This file is parsed during the module's import.
  2. Check whether your league_dict.json file is at the correct location. If so, you should see this appear in the log messages.
$python
>>> import soccerdata as sd
[11/25/22 11:49:12] INFO     Custom team name replacements loaded from <path>/teamname_replacements.json.                                                                                                _config.py:83
                    INFO     Custom league dict loaded from <path>/league_dict.json.                                                                                                                    _config.py:153
  1. Check whether it is added to available leagues by running the command below.
>>> sd.FBref.available_leagues()
['Big 5 European Leagues Combined', 'ENG-Premier League', 'ESP-La Liga', 'FRA-Ligue 1', 'GER-Bundesliga', 'INT-World Cup', 'ITA-Serie A']

If that doesn't work, you probably made a mistake in the syntax of your league_dict.json file. Paste it here and we'll try to help you.

cj0121 commented 1 year ago

Thanks for getting back! Bellow is my league_dict.json file:

{
    "ENG-Premier League": {
        "ClubElo": "ENG_1",
        "MatchHistory": "E0",
        "FiveThirtyEight": "premier-league",
        "FBref": "Premier League",
        "ESPN": "eng.1",
        "SoFIFA": "English Premier League (1)",
        "WhoScored": "England - Premier League",
        "season_start": "Aug",
        "season_end": "May",
    },
    "ESP-La Liga": {
        "ClubElo": "ESP_1",
        "MatchHistory": "SP1",
        "FiveThirtyEight": "la-liga",
        "FBref": "La Liga",
        "ESPN": "esp.1",
        "SoFIFA": "Spain Primera Division (1)",
        "WhoScored": "Spain - LaLiga",
        "season_start": "Aug",
        "season_end": "May",
    },
    "ITA-Serie A": {
        "ClubElo": "ITA_1",
        "MatchHistory": "I1",
        "FiveThirtyEight": "serie-a",
        "FBref": "Serie A",
        "ESPN": "ita.1",
        "SoFIFA": " Italian Serie A (1)",
        "WhoScored": "Italy - Serie A",
        "season_start": "Aug",
        "season_end": "May",
    },
    "GER-Bundesliga": {
        "ClubElo": "GER_1",
        "MatchHistory": "D1",
        "FiveThirtyEight": "bundesliga",
        "FBref": "Fußball-Bundesliga",
        "ESPN": "ger.1",
        "SoFIFA": "German 1. Bundesliga (1)",
        "WhoScored": "Germany - Bundesliga",
        "season_start": "Aug",
        "season_end": "May",
    },
    "FRA-Ligue 1": {
        "ClubElo": "FRA_1",
        "MatchHistory": "F1",
        "FiveThirtyEight": "ligue-1",
        "FBref": "Ligue 1",
        "ESPN": "fra.1",
        "SoFIFA": "French Ligue 1 (1)",
        "WhoScored": "France - Ligue 1",
        "season_start": "Aug",
        "season_end": "May",
    },
    "EFL Championship": {
        "FBref": "EFL Championship",
        "season_start": "Aug",
        "season_end": "May",
    },
    "NED-Eredivisie": {
        "ClubElo": "NED_1",
        "MatchHistory": "N1",
        "SoFIFA": "Holland Eredivisie (1)",
        "FBref": "Dutch Eredivisie",
        "ESPN": "ned.1",
        "FiveThirtyEight": "eredivisie",
        "WhoScored": "Netherlands - Eredivisie",
        "season_start": "Aug",
        "season_end": "May",
    },
}

As you can see I added EFL Championship and NED-Eredivisie. The NED-Eredivisie is a straight copy from the docs. Additional question: for each additional league, is it required to include all five data sources as properties. If yes, the values of those need to be matched by the whatever ID used on the original sites, correct?

Currently I have the json file here on my mac: User/soccerdata/config/league_dict.json. I think the file location might be the problem. I wasn't able to locate SOCCERDATA_DIR suggested in the docs.

Much appreciated!

probberechts commented 1 year ago

No, you do not have to include all five data sources, nor the "season_start" and "season_end" fields.

There is one error in your json file: you should remove the comma at the end of the second to last line to have a valid json file.

You can see where it looks for the json file in the log messages that are printed when importing the library.

$python
>>> import soccerdata as sd
[11/25/22 11:49:12] INFO     Custom team name replacements loaded from <path>/teamname_replacements.json.                                                                                                _config.py:83
                    INFO     Custom league dict loaded from <path>/league_dict.json.                                                                                                                    _config.py:153
cj0121 commented 1 year ago

I got it to work finally! Thanks so much for the help! Turned out to be much easier than I thought. Just needed to make sure the league_dict.json is of correct syntax and at the right place.

andrzej-konczyk commented 1 year ago

Hi, I have simillar case but only now with Eredivisie. I've created league_dict.json which includes : { "NED-Eredivisie": { "ClubElo": "NED_1", "MatchHistory": "N1", "SoFIFA": "Holland Eredivisie (1)", "FBref": "Dutch Eredivisie", "ESPN": "ned.1", "FiveThirtyEight": "eredivisie", "WhoScored": "Netherlands - Eredivisie", "season_start": "Aug", "season_end": "May" } } and after run _config.py I see comment that league is added, but when I run sd.FBref.available_leagues() then I do not have that new league there - I do not knoiw why

probberechts commented 1 year ago

@andrzej-konczyk Your json seems correct. I do not really get what you mean by "after run _config.py I see comment that league is added" though. The file _config.py is not an executable.

One hint I can think of: make sure to reload all imported soccerdata modules after modifying the league_dict.json file. The most straightforward way to do this is to restart your notebook or python interpreter.

andrzej-konczyk commented 1 year ago

Yeah , restart helped. Thanks!

Lushin415 commented 1 year ago

And where can I see the correct names for the leagues? Let's say where you could see the name of the Dutch league on the ESPN website? "ESPN": "ned.1","

lorenzodb1 commented 1 year ago

It appears that the docs are wrong regarding how to add additional leagues for FBref. For instance, it suggests adding to league_dict.json

{
  "NED-Eredivisie": {
    "FBref": "Dutch Eredivisie"
  }
}

when one should actually add

{
  "NED-Eredivisie": {
    "FBref": "Eredivisie"
  }
}

for it to actually work. Not sure if the name used in FBref changed after the example was written, but I just thought of pointing it out as it's quite confusing.

WillT23 commented 11 months ago

Hi, firstly thanks for creating this, I've found it so useful.

I'm having some trouble trying to customise the code to include the Women's World Cup. I've followed the same process of adding other leagues which I've been successful with, but i'm getting the following error. I've also pasted the relevant part of my league_dict json.

KeyError: "None of [Index(['WWC-WWC'], dtype='object', name='league')] are in the [index]"

"WWC-WWC": { "WhoScored": "International - FIFA Women s World Cup", "season_start": "Jul", "season_end": "Sep" } Thanks,

Will

probberechts commented 11 months ago

@WillT23 I believe it should be "International - FIFA Women's World Cup". You forgot the apostrophe.

WillT23 commented 11 months ago

Thanks for the reply, although unfortunately I'm still having the same issue when the apostrophe is in the right place.

probberechts commented 11 months ago

@WillT23 It seems to work fine. See #299. Make sure to reload the soccerdata module after you modify the league_dict.json file and disable caching after adding a new league.

WillT23 commented 11 months ago

Perfect, works now. Thanks so much for your help!