guoguo12 / billboard-charts

Python API for downloading Billboard charts.
https://pypi.python.org/pypi/billboard.py
MIT License
389 stars 117 forks source link

Ensure Session adheres to correct transport adapter #65

Closed asharp02 closed 4 years ago

asharp02 commented 4 years ago

Issue: #64 Ensures program doesn't error out immediately in the case of a 429 response from billboard.

Edit: Prior to this PR, there was a bug within the _get_session_with_retries method. The session.mount statement did not specify a url prefix("" was just passed in) and because of this the Session object grabbed a default adapter with max_retries set to 0. This led to the Session returning a 429 instead of waiting and retrying.

Changes:

guoguo12 commented 4 years ago

Thank you!

Looking more closely at the urllib docs, I see that the Retry class documented here seems to support status code 429. And as of 5.4.0 (which I pushed on December 27), we should be using that class, since we're using requests.adapters.HTTPAdapter and that uses Retry. So I wonder if the existing code is already handling 429s correctly (or could be trivially modified to do so)?

guoguo12 commented 4 years ago

Actually, I think I'm wrong about this. If I run a simple server

import time

from flask import Flask

app = Flask(__name__)

@app.route("/")
def hello_world():
    return "", 429, [("Retry-After", "3")]

and I query it with my _get_session_with_retries function

from billboard import _get_session_with_retries

sess = _get_session_with_retries(3)
req = sess.get("http://localhost:5000/", timeout=25)
req.raise_for_status()

then no retries are attempted, and the client raises:

$ python3 test_retries.py
Traceback (most recent call last):
  File "test_retries.py", line 5, in <module>
    req.raise_for_status()
  File "/usr/lib/python3.6/site-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 429 Client Error: TOO MANY REQUESTS for url: http://localhost:5000/

I'll dig into this more tomorrow.

asharp02 commented 4 years ago

Yea, I'm almost certain that the existing code is not handling 429s at all. I've been making consecutive requests and often get the error that you received.

I also wonder (and I'm pretty sure) if there is a more trivial way to do this. I have been looking into that Retry module documentation that you linked to and there does seem to be a method that does exactly what we want: https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html#urllib3.util.Retry.sleep . I'm still trying to figure out how to get it to work though.

guoguo12 commented 4 years ago

The retry works correctly when I use urllib3 directly:

import urllib3
http = urllib3.PoolManager()
response = http.request('GET', url, retries=3, timeout=25)

Maybe we should stop using the Requests library?

asharp02 commented 4 years ago

Ok so I did a bit of digging and found out what was causing this issue. The gist of it is now in the original description of this PR but a more detailed explanation is to come.

guoguo12 commented 4 years ago

The session.mount statement did not specify a url prefix("" was just passed in) and because of this the Session object grabbed a default adapter with max_retries set to 0.

Interesting. Where does this happen?

guoguo12 commented 4 years ago

Ah, I think I see. There are "https://" and "http://" adapters by default, and the adapter for a given URL is chosen by considering the mounted adapters in order of descending length.

guoguo12 commented 4 years ago

Okay, yep, this looks right to me. Thanks for figuring it out!