swar / nba_api

An API Client package to access the APIs for NBA.com
MIT License
2.53k stars 541 forks source link

HTTPSConnectionPool(host='stats.nba.com', port=443): Read timed out. (read timeout=30) #176

Closed angelespesobv closed 3 years ago

angelespesobv commented 4 years ago

Noticed there is a few threads on this issue yet the solutions provided haven't worked. Maybe I'm doing something wrong? Thanks for the help in advance! from nba_api.stats.static.teams import find_teams_by_full_name from nba_api.stats.endpoints.teamplayerdashboard import TeamPlayerDashboard mia_id = find_teams_by_full_name("Miami Heat")[0]['id'] mia = TeamPlayerDashboard(measure_type_detailed_defense = "Base",per_mode_detailed = "Totals", team_id = mia_id, season = "2019-20").players_season_totals.get_data_frame()

ERROR: `--------------------------------------------------------------------------- timeout Traceback (most recent call last) /usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) 383 # otherwise it looks like a programming error was the cause. --> 384 six.raise_from(e, None) 385 except (SocketTimeout, BaseSSLError, SocketError) as e:

24 frames timeout: The read operation timed out

During handling of the above exception, another exception occurred:

ReadTimeoutError Traceback (most recent call last) ReadTimeoutError: HTTPSConnectionPool(host='stats.nba.com', port=443): Read timed out. (read timeout=30)

During handling of the above exception, another exception occurred:

ReadTimeout Traceback (most recent call last) /usr/local/lib/python3.6/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies) 527 raise SSLError(e, request=request) 528 elif isinstance(e, ReadTimeoutError): --> 529 raise ReadTimeout(e, request=request) 530 else: 531 raise

ReadTimeout: HTTPSConnectionPool(host='stats.nba.com', port=443): Read timed out. (read timeout=30)`

ckirch8 commented 4 years ago

I tried it and didn't get a ReadTimeout, is miami's dashboard the only request in your program or are there some before it?

OmegaP1 commented 3 years ago

I think it's because u tried a lot of times and the API blocks you for some time

virendhanwani commented 3 years ago

I am using the teamgamelog, teamdashboardbyopponent and teamdashboardbygeneralsplits endpoints. All three are working correctly in the local environment. But as soon as I deployed the app, I started receiving the ReadTimeoutError.

I have since then deployed the app using only the teamgamelog endpoint but even that one endpoint is not working. I have also increased the timeout to 45 seconds, but that also did not help.

Any suggestions ?

OmegaP1 commented 3 years ago

How did you increased the timeout?

You need to do the less request possible, to Avoid timeouts I use local variables, for example if I want to know the points/assists and rebounds of a tem, instead of calling 3 times for the api I use a local var to safe that team JSON and then I use that VAR to know everything o wanted. That reduced me a lot of api timeouts

virendhanwani commented 3 years ago

@OmegaP1 I copied the teamgamelog file and some other files locally and increased the timeout variable. I did this because I read on a StackOverflow comment that the response time is 40 seconds. And I am using variables as you have mentioned. I just call the API once and use the get_data_frames function. Once I have the dataframe, I use it for all the processing and other stuff.

OmegaP1 commented 3 years ago

Then I don't understand why that append, but u could try using time.sleep(.600) between actions u do, that worked for me as well

virendhanwani commented 3 years ago

I am calling the API only once and that too in the first line. How will time.sleep() help ?

OmegaP1 commented 3 years ago

I mean when u call for the functions of the api, imagine when u ask for a team, that has 17 players and then u want to know all the players info, u call the api 17 times, there between player u should do a sleep

virendhanwani commented 3 years ago

I am just creating a dashboard of a team currently. So I just use the teamgamelog endpoint to get all the details. Other endpoints I use are for comparison between different teams. Since I have faced the error I have commented out all the other endpoints and only kept the teamgamelog endpoint. But that single endpoint is also not working

chinmaykulkarni9 commented 3 years ago

Having the same timeout problem. I'm using the BoxScoreAdvancedV2 endpoint, and seems to be very inconsistent with respect to success. I wonder if there is a maximum times I can pull info from the api before it starts to time out. Especially frustrating as the BoxScoreAdvancedV2 endpoint only provides stats for one game, if iterating in a loop. Currently using v1.1.8.

JaumeClave commented 3 years ago

I am experiencing the same issue. On my local environment, I have a timeout after each call. and all my endpoint calls work as expected. However, on my web app, I receive a HTTPSConnectionPool(host='stats.nba.com', port=443): Read timed out. (read timeout=30) on my first call.

Any insight as to how this might be fixed is extremely appreciated.

On Issue 55 some have suggested that NBA might be blocking calls from cloud providers such as AWS and calls from Google Collab. My web app runs on Streamlit and NBA seems to be blocking calls with whatever stack Streamlit use to deploy.

rsforbes commented 3 years ago

For those having issues with calls, there are a couple of known factors to consider.

  1. The NBA blocks all cloud hosting providers. This is a reason that integration tests are not run from the CI build. Cloud provider blocks can be a factor even when using services like Google Colab
  2. The NBA will throttle traffic after making a number of API calls. it's recommend to have your thread sleep. I've found a setting of .600 to be an optimum value.

The best option is to always try it locally first to see if all is well. If it is, then it's likely a block.

While I have not tried it, there is an option of using a proxy. You could attempt to use that from cloud to determine if your deploy worked, but you are in fact getting blocked.

Hope that helps. It's a common issue that is raised often.

chinmaykulkarni9 commented 3 years ago

I haven't found the perfect solution to this answer, but as of 2/2 and using the BoxScoreAdvancedV2 endpoint, I've been able to successfully call the endpoint in a loop if I add time.sleep(1) in the loop. Since I'm trying to find all advanced stats for each game for any given season, this turns out to being about 2100 cycles/seconds I believe. Time is not a huge issue since I'm just scraping the data to a csv, but adding a slight delay between loop iterations has helped me achieve consistent results as of now.

leimao commented 2 years ago

For those having issues with calls, there are a couple of known factors to consider.

  1. The NBA blocks all cloud hosting providers. This is a reason that integration tests are not run from the CI build. Cloud provider blocks can be a factor even when using services like Google Colab
  2. The NBA will throttle traffic after making a number of API calls. it's recommend to have your thread sleep. I've found a setting of .600 to be an optimum value.

The best option is to always try it locally first to see if all is well. If it is, then it's likely a block.

While I have not tried it, there is an option of using a proxy. You could attempt to use that from cloud to determine if your deploy worked, but you are in fact getting blocked.

Hope that helps. It's a common issue that is raised often.

This is incredibly useful. Although I don't know where the source of "The NBA blocks all cloud hosting providers." is, I feel this matches what I am experiencing.

rsforbes commented 2 years ago

@leimao - The NBA does not make its firewall rules public. That being said, I have spent some time in the networking space. Here are the basics of what is likely happening. I'm going to assume no prior knowledge. I should extend this and put this out on Medium! 👍

THE CLOUD ARCHITECTURE

Cloud provides, like AWS, millions of physical servers. On top of that, those servers are virtualized, creating millions more virtual machines. You can split that into millions of containers, such as running on top of Kubernetes (K8s). If that's not your preferred route, you can simply run a Serverless Application and use FaaS (Functions as a Service) like Lambda.

THE PROBLEM

Any single cloud provider has enough computing power that any single person could scale an application to take down any site in the world effectively. This is referred to as DDOS (Distributed Denial of Service); it doesn't even have to be intentional, someone could have just written an infinite loop.

DEFENSE IN DEPTH

Security is managed via the concept of Defense in Depth. This means that security is provided in layers. Should any layer be compromised, there is yet another layer that must be breached. In the same way, protecting a highly available service like the NBA's website, stats, and other services is done using this practice. Multiple tools can be used, and prices range from relatively inexpensive to very expensive. A good article from Fortinet titled, Defense in Depth

I will cover three primary DDOS defenses that can be put into place with relative ease; though the extent of implementation determines price.

IP ALLOW LISTS AND BLOCK LISTS

Probably the easiest implementation is to ask the question, who are my customers? I don't think it will take you long to guess that you and I, running programs on the cloud, that get statistical data from the NBA for free, are their target audience. To the NBA, our programs are nothing more than bots. There is zero revenue to be gained. With that in mind, the NBA can ask themselves why they would allow any cloud provider to connect to our APIs, given the potential for a DDOS attack. There is no good reason. In short, they want human traffic in which they can build their brand, interact with fans, and generate revenue.

In this case, it is relatively easy for the NBA to block all IPs from cloud providers. The majority of cloud providers make their IP addresses publicly known. AWS makes their IP address ranges available, and companies can subscribe to them

RATE LIMITING

Beyond the allow and block lists, the next defensive measure is to limit how many times an individual can make requests to a given service. This is called rate limiting. Cloudflare has a good article titled, What is rate limiting? Rate limiting and bots. Rate limiting is also a form of DDOS protection. Through trial and error (reverse engineering), I determined I could request the NBA's API once every 600ms.

Like allow and block lists, rate limiting is typically implemented within a Firewall. When implementing a rate limit, the firewall rule can be set with filtering keys. A key can be as simple as an IP address and may contain other characteristics. A quick article on AWS WAF (Web Application Firewall) titled Rate-based rule statement will give you an idea.

DETECTION AND MITIGATION

This is where DDOS protection can get pricey. Is it worth it? Yes. There is simply too much risk today not to have DDOS protection. On top of that, several companies and products are available.

One of the leading companies in this space is Radware. You can learn more on their DDoS Attack Prevention Services: Multi Layered DDoS Protection and Security Solutions page. Check out their Live Threat Map for some real cool data!

The idea here is that even if I have an IP allow list, an IP block list, and have rate limiting configured, that does not stop someone from making repeated calls over and over without end. This is where detection and mitigation come in. Should a firewall become so overwhelmed that it is no longer able to respond to legitimate traffic, a product such as Radware will step in the middle and begin absorbing, filtering, and redirecting that traffic. Note, while I say step in, Radware is always there inspecting the traffic, it's just quietly analyzing it.

IN SUMMARY

While I do not have any details regarding how the NBA has configured their networking infrastructure, there are some general design patterns that the industry uses that can be applied based on observation. Even if you were lucky enough to find someone who works for the NBA and specifically works on their network, they would not tell you either simply for the fact of a potential security breach. We all know how bad that can get.

I hope you enjoyed this, was filled within some things that perhaps you didn't know, and didn't bore you so much that you fell asleep reading it. 😂

ChristopherBanas commented 2 years ago

@leimao - The NBA does not make its firewall rules public. That being said, I have spent some time in the networking space. Here are the basics of what is likely happening. I'm going to assume no prior knowledge. I should extend this and put this out on Medium! 👍

THE CLOUD ARCHITECTURE

Cloud provides, like AWS, millions of physical servers. On top of that, those servers are virtualized, creating millions more virtual machines. You can split that into millions of containers, such as running on top of Kubernetes (K8s). If that's not your preferred route, you can simply run a Serverless Application and use FaaS (Functions as a Service) like Lambda.

THE PROBLEM

Any single cloud provider has enough computing power that any single person could scale an application to take down any site in the world effectively. This is referred to as DDOS (Distributed Denial of Service); it doesn't even have to be intentional, someone could have just written an infinite loop.

DEFENSE IN DEPTH

Security is managed via the concept of Defense in Depth. This means that security is provided in layers. Should any layer be compromised, there is yet another layer that must be breached. In the same way, protecting a highly available service like the NBA's website, stats, and other services is done using this practice. Multiple tools can be used, and prices range from relatively inexpensive to very expensive. A good article from Fortinet titled, Defense in Depth

I will cover three primary DDOS defenses that can be put into place with relative ease; though the extent of implementation determines price.

IP ALLOW LISTS AND BLOCK LISTS

Probably the easiest implementation is to ask the question, who are my customers? I don't think it will take you long to guess that you and I, running programs on the cloud, that get statistical data from the NBA for free, are their target audience. To the NBA, our programs are nothing more than bots. There is zero revenue to be gained. With that in mind, the NBA can ask themselves why they would allow any cloud provider to connect to our APIs, given the potential for a DDOS attack. There is no good reason. In short, they want human traffic in which they can build their brand, interact with fans, and generate revenue.

In this case, it is relatively easy for the NBA to block all IPs from cloud providers. The majority of cloud providers make their IP addresses publicly known. AWS makes their IP address ranges available, and companies can subscribe to them

RATE LIMITING

Beyond the allow and block lists, the next defensive measure is to limit how many times an individual can make requests to a given service. This is called rate limiting. Cloudflare has a good article titled, What is rate limiting? Rate limiting and bots. Rate limiting is also a form of DDOS protection. Through trial and error (reverse engineering), I determined I could request the NBA's API once every 600ms.

Like allow and block lists, rate limiting is typically implemented within a Firewall. When implementing a rate limit, the firewall rule can be set with filtering keys. A key can be as simple as an IP address and may contain other characteristics. A quick article on AWS WAF (Web Application Firewall) titled Rate-based rule statement will give you an idea.

DETECTION AND MITIGATION

This is where DDOS protection can get pricey. Is it worth it? Yes. There is simply too much risk today not to have DDOS protection. On top of that, several companies and products are available.

One of the leading companies in this space is Radware. You can learn more on their DDoS Attack Prevention Services: Multi Layered DDoS Protection and Security Solutions page. Check out their Live Threat Map for some real cool data!

The idea here is that even if I have an IP allow list, an IP block list, and have rate limiting configured, that does not stop someone from making repeated calls over and over without end. This is where detection and mitigation come in. Should a firewall become so overwhelmed that it is no longer able to respond to legitimate traffic, a product such as Radware will step in the middle and begin absorbing, filtering, and redirecting that traffic. Note, while I say step in, Radware is always there inspecting the traffic, it's just quietly analyzing it.

IN SUMMARY

While I do not have any details regarding how the NBA has configured their networking infrastructure, there are some general design patterns that the industry uses that can be applied based on observation. Even if you were lucky enough to find someone who works for the NBA and specifically works on their network, they would not tell you either simply for the fact of a potential security breach. We all know how bad that can get.

I hope you enjoyed this, was filled within some things that perhaps you didn't know, and didn't bore you so much that you fell asleep reading it. 😂

as someone who is hoping to use the NBA API on a site that will allow users to make calls in real time, getting past the cloud block is a massive step. this explanation was extremely helpful, thank you

jeremy-neale commented 1 year ago

@leimao - The NBA does not make its firewall rules public. That being said, I have spent some time in the networking space. Here are the basics of what is likely happening. I'm going to assume no prior knowledge. I should extend this and put this out on Medium! 👍

THE CLOUD ARCHITECTURE

Cloud provides, like AWS, millions of physical servers. On top of that, those servers are virtualized, creating millions more virtual machines. You can split that into millions of containers, such as running on top of Kubernetes (K8s). If that's not your preferred route, you can simply run a Serverless Application and use FaaS (Functions as a Service) like Lambda.

THE PROBLEM

Any single cloud provider has enough computing power that any single person could scale an application to take down any site in the world effectively. This is referred to as DDOS (Distributed Denial of Service); it doesn't even have to be intentional, someone could have just written an infinite loop.

DEFENSE IN DEPTH

Security is managed via the concept of Defense in Depth. This means that security is provided in layers. Should any layer be compromised, there is yet another layer that must be breached. In the same way, protecting a highly available service like the NBA's website, stats, and other services is done using this practice. Multiple tools can be used, and prices range from relatively inexpensive to very expensive. A good article from Fortinet titled, Defense in Depth I will cover three primary DDOS defenses that can be put into place with relative ease; though the extent of implementation determines price.

IP ALLOW LISTS AND BLOCK LISTS

Probably the easiest implementation is to ask the question, who are my customers? I don't think it will take you long to guess that you and I, running programs on the cloud, that get statistical data from the NBA for free, are their target audience. To the NBA, our programs are nothing more than bots. There is zero revenue to be gained. With that in mind, the NBA can ask themselves why they would allow any cloud provider to connect to our APIs, given the potential for a DDOS attack. There is no good reason. In short, they want human traffic in which they can build their brand, interact with fans, and generate revenue. In this case, it is relatively easy for the NBA to block all IPs from cloud providers. The majority of cloud providers make their IP addresses publicly known. AWS makes their IP address ranges available, and companies can subscribe to them

RATE LIMITING

Beyond the allow and block lists, the next defensive measure is to limit how many times an individual can make requests to a given service. This is called rate limiting. Cloudflare has a good article titled, What is rate limiting? Rate limiting and bots. Rate limiting is also a form of DDOS protection. Through trial and error (reverse engineering), I determined I could request the NBA's API once every 600ms. Like allow and block lists, rate limiting is typically implemented within a Firewall. When implementing a rate limit, the firewall rule can be set with filtering keys. A key can be as simple as an IP address and may contain other characteristics. A quick article on AWS WAF (Web Application Firewall) titled Rate-based rule statement will give you an idea.

DETECTION AND MITIGATION

This is where DDOS protection can get pricey. Is it worth it? Yes. There is simply too much risk today not to have DDOS protection. On top of that, several companies and products are available. One of the leading companies in this space is Radware. You can learn more on their DDoS Attack Prevention Services: Multi Layered DDoS Protection and Security Solutions page. Check out their Live Threat Map for some real cool data! The idea here is that even if I have an IP allow list, an IP block list, and have rate limiting configured, that does not stop someone from making repeated calls over and over without end. This is where detection and mitigation come in. Should a firewall become so overwhelmed that it is no longer able to respond to legitimate traffic, a product such as Radware will step in the middle and begin absorbing, filtering, and redirecting that traffic. Note, while I say step in, Radware is always there inspecting the traffic, it's just quietly analyzing it.

IN SUMMARY

While I do not have any details regarding how the NBA has configured their networking infrastructure, there are some general design patterns that the industry uses that can be applied based on observation. Even if you were lucky enough to find someone who works for the NBA and specifically works on their network, they would not tell you either simply for the fact of a potential security breach. We all know how bad that can get. I hope you enjoyed this, was filled within some things that perhaps you didn't know, and didn't bore you so much that you fell asleep reading it. 😂

as someone who is hoping to use the NBA API on a site that will allow users to make calls in real time, getting past the cloud block is a massive step. this explanation was extremely helpful, thank you

Did you find a solution? Will using a custom proxy for each request (and passing that to the nba_api library call) circumvent this? Running an app on digital ocean and hitting the cloud firewall.

leimao commented 1 year ago

I tried proxy a while ago and it did not work.

ChristopherBanas commented 1 year ago

I tried proxy a while ago and it did not work.

what service? i use smart proxy residential rotating proxy and it works perfectly

leimao commented 1 year ago

I tried proxy a while ago and it did not work.

what service? i use smart proxy residential rotating proxy and it works perfectly

I think I was using the free proxies found online (don't remember exactly what it was). Probably they have all been blocked.

ThiagoPanini commented 1 year ago

I tried proxy a while ago and it did not work.

what service? i use smart proxy residential rotating proxy and it works perfectly

Hi @ChristopherBanas! Do you mind to share how did you set up requests to endpoints using the smart proxy service?

Is there anything similar to the snippet below?

# Setting up a proxy dict
proxy_dict = {
    "http": f"http://{hostname}:{port}",
    "https": f"http://{hostname}:{port}"
}

# Making the request to an endpoint (i.e. commonallplayers)
endpoint = commonallplayers.CommonAllPlayers(proxy=proxy_dict)
data = endpoint.common_all_players.get_data_frame()
ChristopherBanas commented 1 year ago

I tried proxy a while ago and it did not work.

what service? i use smart proxy residential rotating proxy and it works perfectly

Hi @ChristopherBanas! Do you mind to share how did you set up requests to endpoints using the smart proxy service?

Is there anything similar to the snippet below?

# Setting up a proxy dict
proxy_dict = {
    "http": f"http://{hostname}:{port}",
    "https": f"http://{hostname}:{port}"
}

# Making the request to an endpoint (i.e. commonallplayers)
endpoint = commonallplayers.CommonAllPlayers(proxy=proxy_dict)
data = endpoint.common_all_players.get_data_frame()

I tried proxy a while ago and it did not work.

what service? i use smart proxy residential rotating proxy and it works perfectly

Hi @ChristopherBanas! Do you mind to share how did you set up requests to endpoints using the smart proxy service?

Is there anything similar to the snippet below?

# Setting up a proxy dict
proxy_dict = {
    "http": f"http://{hostname}:{port}",
    "https": f"http://{hostname}:{port}"
}

# Making the request to an endpoint (i.e. commonallplayers)
endpoint = commonallplayers.CommonAllPlayers(proxy=proxy_dict)
data = endpoint.common_all_players.get_data_frame()

Looks similar to mine.

What I did was store mine as an environment variable (needed this to not hard code the proxy in prod) and send it into a forked API I made. It was just a string but the approach you're going with above will probably work.

The string proxy I would send to the package looked like this

http://USER_NAME>:<ID>@us.smartproxy.com:<NUMBER

ChristopherBanas commented 1 year ago

Similarly to how it is used here https://github.com/swar/nba_api/blob/master/docs/nba_api/stats/examples.md#endpoint-usage-example

ThiagoPanini commented 1 year ago

Thank you very much! I will try this proxy workaround in order to deploy a Lambda function on AWS. This whole thread helped me a lot to understand the ReadTimeout error.

gabesolomon10 commented 3 days ago

Very helpful for deploying on cloud in 2024 with smart proxy