seemethere / nba_py

Python client for NBA statistics located at stats.nba.com
BSD 3-Clause "New" or "Revised" License
1.05k stars 255 forks source link

All endpoints are not available since saturday #88

Open ssaurel opened 7 years ago

ssaurel commented 7 years ago

Hello,

It seems that all endpoints are not available since saturday. I get an "Access Denied" error now whereas it worked great until friday 04/07/2017.

Others users have the same problem than me ?

Sylvain

jorgegil96 commented 7 years ago

Noticed this on a separate app of mine that uses the same endpoints.

Testing with http://stats.nba.com/stats/playoffpicture?LeagueID=00&SeasonID=22015 it seems like the endpoint is still public but they're detecting non-browser requests from programs like nba_py. Adding a fake browser user agent to the headers is not working anymore.

Would be nice to get some help from someone with more networking experience.

ssaurel commented 7 years ago

I tested the scoreboardV2 endpoint in my browser : http://stats.nba.com/stats/scoreboardV2?DayOffset=0&LeagueID=00&gameDate=04/09/2017

I get the Access Denied message in response.

jorgegil96 commented 7 years ago

Ah I hadn't noticed, that really sucks.

ssaurel commented 7 years ago

Yes, I used also the referer header previously to make the calls working. But, it seems they have changed their API to accept only same domain request.

bttmly commented 7 years ago

However, you can still make these requests from their website. For instance open up http://stats.nba.com/ and then pop open the console and enter:

fetch("http://stats.nba.com/stats/scoreboardV2?DayOffset=0&LeagueID=00&gameDate=04/09/2017")
  .then(resp => resp.json())
  .then(data => console.log(data))

it works fine! (assuming you have a modern browser). So the question is, how do their API servers tell apart these requests from ones we send with a programmatic HTTP client? I don't know much about the minutiae of HTTP. It seems like it should be possible to spoof whatever they are doing. In fact if you use Chrome's "copy as cURL" feature, that works too

curl 'http://stats.nba.com/stats/scoreboardV2?DayOffset=0&LeagueID=00&gameDate=04/09/2017' -H 'DNT: 1' -H 'Accept-Encoding: gzip, deflate, sdch' -H 'Accept-Language: en' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36' -H 'Accept: */*' -H 'Referer: http://stats.nba.com/' -H 'Connection: keep-alive' --compressed

More to the point, I'm curious where these failed requests are originating from? There is another issue which seems like they may be blocking requests from AWS or other cloud providers. FWIW the test suite for this repo passes for me on my local machine, as do the tests for a similar Node.js client.

jakejones commented 7 years ago

For those of you who only need it to work on a single server. My solution was to visit stats.nba.com/scores using a browser on my server (Given you have a GUI), then replicate the request header the browser made in the curl request that my app uses. I don't know if this will work long term yet, but it seems to be working for now.

ccagrawal commented 7 years ago

I have a similar package in R, and I ran into the same issue.

I was able to fix it by adding the following Request Header: 'Accept-Language' = 'en-US,en;q=0.8,af;q=0.6'

imjcham commented 7 years ago

Looks like stats nba is restricting based on user-agent from what I can tell. I found that Chrome/48 and higher work, anything below Chrome/48 gets blocked. Wondering what kind of weird firewall or rule this is....

ccagrawal commented 7 years ago

@USJake Here are all my headers:

add_headers(
      'Accept-Language' = 'en-US,en;q=0.8,af;q=0.6',
      'Referer' = 'http://stats.nba.com/player/',
      'User-Agent' = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'
    )
ssaurel commented 7 years ago

@ccagrawal It works for all the endpoints for you ?

ssaurel commented 7 years ago

I tried the same solution that @bttmly and @UnfixedSNIPERJ in php with a Curl session but it doesn't work. This is the code I tried

<?php

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, "http://stats.nba.com/stats/scoreboardV2?DayOffset=0&LeagueID=00&gameDate=04/09/2017");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "GET");

curl_setopt($ch, CURLOPT_ENCODING, 'gzip, deflate');

$headers = array();
$headers[] = "Dnt: 1";
$headers[] = "Accept-Encoding: gzip, deflate, sdch";
$headers[] = "Accept-Language: en";
$headers[] = "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36";
$headers[] = "Accept: */*";
$headers[] = "Referer: http://stats.nba.com/";
$headers[] = "Connection: keep-alive";

curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

$result = curl_exec($ch);

if (curl_errno($ch)) {
    echo 'Error:' . curl_error($ch);
} else {
    echo $result;
}

curl_close ($ch);

?>
ssaurel commented 7 years ago

@UnfixedSNIPERJ I have the following version of Curl installed : 7.21.0 . Can you tell me your version ?

jakejones commented 7 years ago

@ssaurel That code worked on my local machine(curl version: 7.51.0) but interestingly it didn't work on my server (Digital Ocean)(curl version: 7.47.0).

However adding the following header made it work on my server also: $headers[] = "origin: http://stats.nba.com";

ssaurel commented 7 years ago

@jakejones With this last change, it works now ! My PHP scripts work great now :). If some of your can be interested with a PHP version, tell me and I will create a repository on GitHub. For example : http://www.ssaurel.com/baskethoops/index.php?date=04/22/2017

samody commented 7 years ago

I have an application that pulls allot of data once a day from multiple endpoints. I started threading it a couple weeks ago to speed the process up, which admittedly wasn't very kind to stats.nba.com. Shortly after that, it started failing. Today i was using Wireshark, i could see that the app would get through a 100 requests or so then the endpoint would only send me an ACK and no data.

Then, also today I updated my Header to include the updated USER AGENT, but really made no difference. However i did find that slowing my Request rate down to 1/second let me get a little further, but ultimately the same result. Going to add all the headers mentioned above, and try again

Same result when running the app from a different location/ WANIP address.

@bttmly i too noticed that they're not accepting requests from AWS

How many requests are you guys making in total? and how quickly?

EDIT Added these headers, combined with a request rate of 1/ seconds, and every request is now being fulfilled.. YAY!

    'user-agent': ('Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'),
    'Dnt': ('1'),
    'Accept-Encoding': ('gzip, deflate, sdch'),
    'Accept-Language': ('en'),
    'origin': ('http://stats.nba.com')
johngriebel commented 7 years ago

Are people still having this issue? I have an application that hits stats.nba.com with some code I've written as well as nba_py. It's been a month or so since I last worked on my application, and I was planning on picking it up again soon, but this may cause significant problems obviously.

Playing around in the shell, nba_py seems to be working fine. Can anybody else confirm? Perhaps the terminal seems fine because I am not exceeding a rate limit, as @samody suggested?

gyerli commented 7 years ago

Well, folks, I think this is the end of public programmatical use of stats.nba.com.

My code was running on AWS and it was ok until around 4/6/2017. I migrated to the local machine, still not working. Tried all the header variations, VPN connections etc. No luck... Most of the HTTP requests blocked.

Still trying but I have little to no hope. I will let you guys know if figure out something.

johngriebel commented 7 years ago

@gyerli Things seem to be working fine for me this morning. Perhaps we could compare configurations or something?

gyerli commented 7 years ago

@johngriebel What is working for you? Even test_nba_py.py is stuck on HTTP request. Please try this and let me know if it works.

import nba_py

def test():
    a = nba_py.Scoreboard(month=2, day=21, year=2015)
    print a
test()
johngriebel commented 7 years ago

@gyerli Works fine for me. nba_py test

bttmly commented 7 years ago

@johngriebel where did you run that? It seems to work for people from their local machines but not from cloud instances (particularly AWS but also a few others)

johngriebel commented 7 years ago

@bttmly Well that makes sense, this was on a server I host myself. It's strange that @gyerli can't get things working even locally.

ssaurel commented 7 years ago

It still works for me on an other hoster than AWS.

2017-06-01 14:11 GMT+02:00 John Griebel notifications@github.com:

@bttmly https://github.com/bttmly Well that makes sense, this was on a server I host myself. It's strange that @gyerli https://github.com/gyerli can't get things working even locally.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/seemethere/nba_py/issues/88#issuecomment-305475073, or mute the thread https://github.com/notifications/unsubscribe-auth/ADEIwwOJehSFaICe6BXcfxlBOYfNmgMnks5r_qp_gaJpZM4M4c2l .

TrevorMcCormick commented 7 years ago

@samody can confirm the headers added to the request work for me. using my own ipython notebook on a local machine. making one request at a time but i will inspect rate limits.

#Gets Lebron's common player info
url = 'http://stats.nba.com/stats/commonplayerinfo/?playerid=2544'  
headers = {'user-agent': ('Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'),
    'Dnt': ('1'),
    'Accept-Encoding': ('gzip, deflate, sdch'),
    'Accept-Language': ('en'),
    'origin': ('http://stats.nba.com')}`  
response = requests.get(url, data = json.loads(response.text), headers = headers)
data = json.loads(response.text)
keys = data['resultSets'][0]['headers']
values = data['resultSets'][0]['rowSet'][0]
return dict(zip(keys, values))
#Returns...
{u'BIRTHDATE': u'1984-12-30T00:00:00',
 u'COUNTRY': u'USA',
 u'DISPLAY_FIRST_LAST': u'LeBron James',
 u'DISPLAY_FI_LAST': u'L. James',
 u'DISPLAY_LAST_COMMA_FIRST': u'James, LeBron',
 u'DLEAGUE_FLAG': u'N',
...
ejbrennan99 commented 7 years ago

It's also possible that the exact same setup but from two different locations (IP address) might not work. It's not impossible that a particular IP address that is making lots of requests is getting blacklisted permanently - much like they seem to have done with the AWS ip ranges.

Does anyone know if the NBA is actually trying to prevent people from using this data at all? or just trying to limit the amount of requests? Seems odd that they have a public api, that requires no authentication, and yet seem to work very hard to prevent people from using it; especially when the correct solution (if they were in fact trying to deny access), would be to implement some security on the endpoints.

johngriebel commented 7 years ago

@ejbrennan99 I think they are trying to limit requests at the very least. I've been testing load limits, and it seems I can get about 15 requests in almost instantaneously. After that point, queries hang indefinitely. I haven't tried throttling yet to see what the limits might be, nor have I figured out how long the lockout period is.

ngxiaoyi commented 7 years ago

@johngriebel same limit rate according to my try.

samody commented 7 years ago

Best bet to obtain reliability is to slow your request rate. If you fire off another request as soon as the previous is fulfilled, the endpoint stops responding with data and only sends you an ACK. Delaying by 1 second in between requests yields success. It does take some time to get all the data though.

Threading the requests was fun while it lasted ( hope this isn't my fault :octopus: )

ryomayes commented 7 years ago

After trying many different R and Python NBA endpoint frameworks, this one is the only one that seems to work as of today. @ccagrawal your R package is fantastic, but it looks like the headers are outdated. I forked your repo and copy/pasted the headers from nba_py and it works on my local machine now.

Noted on the limit rate - hopefully this is as strict as NBA gets.

huangzhenyu commented 6 years ago

It couldn't work again!