maksimhorowitz / nflscrapR

R Package for Scraping and Aggregating NFL Data
522 stars 139 forks source link

scrape_game_ids error in strsplit(headers, "\r\n") : non-character argument #165

Open CraigM917 opened 4 years ago

CraigM917 commented 4 years ago

I am very new to R and code in general so I apologise if this is trivial.

I want to scrape the final scores of all the games from 2010 to 2014 and then all the games from 2015 to 2019 as I am curious to see if the extra point rule change had an effect on key numbers.

I have entered scrape_game_ids(2019) but get knocked back with the following error, "Error in strsplit(headers, "\r\n") : non-character argument". Can you give me some advice on what I am doing wrong and how I can get round this?

Thanks.

njconn commented 4 years ago

I'm running into the same issue - were you able to solve the problem?

royemanuel commented 4 years ago

I discovered the same issue about a week ago. It had been running well previous to that time. I don't have the exact dates, but I'd say around May 14th I started getting this error. That's my memory, but the NFL released the 2020 schedule on May 7th, so I'd bet that this was the cause.

I believe NFL.com changed the URL for the data. I dug into the scrape functions and found the create url functions. Running create_game_json_url and create_game_html_url for the first game of the 2017 season (source is the help file for create_game_json_url) builds the following URLs:

> create_game_json_url(2017090700)
[1] "http://www.nfl.com/liveupdate/game-center/2017090700/2017090700_gtd.json"
> create_game_html_url(2017090700)
[1] "http://www.nfl.com/widget/gc/2011/tabs/cat-post-playbyplay?gameId=2017090700&enableNGS=false"

Both URLs lead to a "404 - Flag on the Play" error from NFL.com. I cannot confirm what happened before the strsplit error because I had no reason to check the URL.

mrcaseb commented 4 years ago

The NFL changed it‘s server backends completely and shut down the public feeds. The new APIs require credentials so they are not public anymore.

sventura commented 4 years ago

Reminder that all of the old data lives here: https://github.com/ryurko/nflscrapR-data

andre03051 commented 4 years ago

The NFL changed it‘s server backends completely and shut down the public feeds. The new APIs require credentials so they are not public anymore.

That is wicked frustrated... aren't they making enough money???

jscottp99 commented 3 years ago

I have found that by replacing http://www.nfl.com with http://nflcdns.nfl.com in the base URLs in the following functions: create_game_html_url, proper_jsonurl_formatting, create_game_json_url, extracting_gameids, and scrape_game_ids that I can get it all working for the most part. I was also able to update base urls in the buildURL and build_url sub-functions; however, there are additional base urls in the getGSISID, get_gsi_id and get_birthdate sub functions that must be updated.

I'm fairly new to R and programming altogether so forgive me if my approach is incorrect or could have been accomplished in a more efficient way.

As an example, I used ls(getNamespace("nflscrapR"),all.names = TRUE) to get a list of function names. I then used trace("FUNCTION", edit = TRUE) to edit the function.

As an example for the hidden sub-functions, I used trace("buildURL", where = season_rosters, edit=TRUE) to edit the buildURL sub-function; however this does not work for the getGSISID, get_gsis_id and get_birthdate sub-functions.

As I understand it, the edits I made in this way for all of the functions are only temporary and last only for the session and must be made permanent by repackaging the package; however, I do not want to do that until I figure out how to edit the final three sub-functions. Without these edits, the season_rosters and get_season_rosters will not function correctly.

sventura commented 3 years ago

Thanks for this. I think these folks (https://github.com/mrcaseb/nflfastR), who are maintaining a new version of this package, have this all figured out, but let them know otherwise.