maksimhorowitz / nflscrapR

R Package for Scraping and Aggregating NFL Data
522 stars 139 forks source link

More efficient roster scraping #157

Closed mrcaseb closed 4 years ago

mrcaseb commented 4 years ago

I have developed a function to scrape roster data from http://www.nfl.com/feeds-rs which is compared to the package function highly efficient (few seconds for all teams per season). Data is available back to the 2002 season, includes more information (see column names below) and is joinable with gsisID. Probably the most useful column is the headshot_url.

[1] "team.season"              "team.teamId"              "team.abbr"               
 [4] "team.cityState"           "team.fullName"            "team.nick"               
 [7] "teamPlayers.nflId"        "teamPlayers.displayName"  "teamPlayers.firstName"   
[10] "teamPlayers.lastName"     "teamPlayers.esbId"        "teamPlayers.gsisId"      
[13] "teamPlayers.birthDate"    "teamPlayers.homeTown"     "teamPlayers.collegeId"   
[16] "teamPlayers.collegeName"  "teamPlayers.position"     "teamPlayers.height"      
[19] "teamPlayers.weight"       "teamPlayers.middleName"   "teamPlayers.suffix"      
[22] "teamPlayers.headshot_url" "teamPlayers.profile_url" 

You can check the data via a temporarily uploaded secret Gist. If you are interested in adding the function to the package or use the code in other ways please contact me here or on Twitter. Otherwise just close the issue and that's it.

mrcaseb commented 4 years ago

Correction. I've fixed a bug and can now scrape roster data back to 1920!

mrcaseb commented 4 years ago

Update: I've optimized the function now and at highest speed (no waiting time between scraping teams) the function took 1.39 sec to scrape all 32 rosters of the 2019 season (3111 Players, .txt attached below).

image

NFL_Roster_2019.txt

ryurko commented 4 years ago

Thanks for letting us know, but please submit a pull request with the code for us to consider including it within the package.