Open LinqLover opened 4 years ago
What is the original request? Have you set any limit for tweets to be retrieved?
ts.query_tweets("museumbarberini", begindate=dt.date(2015, 1, 1))
Ok, I was experiencing a similar issue. In my case, I'm asking for all tweets in the past two weeks for a set of 10 different hashtags and after a while, the server is responding with a 429 TOO_MANY_REQUESTS. For those kinds of responses, the body sent from Twitter is a page basically telling you "oops, we are slowing down you!", which has an unknown format for the scraper and this causes the decoding error (since the scraper does not check for HTTP response statuses, I could not understand immediately the root cause of the problem).
I don't know if this can be the same issue, anyway I had to debug the code to confirm that.
same here, the very first line of response is ERROR: <Response [429]>
to check that i have added logger.exception("{}".format(response))
just after line logger.exception('Failed to parse JSON "{}" while requesting "{}"'.format(e, url))
in query.py
a dirty-dirty workaround was for me to add
if retry == 45:
logger.info("RETRY: {}".format(str(retry)))
logger.info("SLEEPING")
time.sleep(360)
to the beginning of the query_single_page
function.
But this is a very-very ugly and dirty temporary not-a-fix
also random.shuffle(proxies)
seems to help for this error (or I am being delusional here) ;)
Wow, it seems as we are missing a request.raise_for_status()
just here. If I understand the code correctly (which appears to be "slightly" abusive in terms of recursion), the right way could be:
response = requests.get(url, headers=HEADER, proxies={"http": proxy}, timeout=timeout)
if response.status == 429:
return query_single_page(query, lang, pos, retry - 1, from_user)
response.raise_for_status()
Is there any plans to fix this? I'm running into it an awful lot recently.
can someone share his/her code i am new to python i am in tool based scraping but want twitter data for research work
I have a working fork. It sleeps for 5 minutes once the rate limit errors start and resumes afterwords. Worked perfectly for me.
https://github.com/EthanZeigler/twitterscraper
To pull using pip, search up something like pip install from git
. Should find a solution.
@EthanZeigler could you make a PR?
Yup. Just didn't want to change default behavior without a CLI option to change it. Got sidetracked and forgot about it.
Occurring sporadically. This does not break the execution of twitterscraper, but appears as an error in our log.