taspinar / twitterscraper

Scrape Twitter for Tweets
MIT License
2.41k stars 578 forks source link

JSONDecodeErrors #271

Open LinqLover opened 4 years ago

LinqLover commented 4 years ago

Occurring sporadically. This does not break the execution of twitterscraper, but appears as an error in our log.


ERROR: Failed to parse JSON "Expecting value: line 1 column 1 (char 0)" while requesting "https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaCwLSBg42rtSEWgsC8udGv56QiEjUAFQAlAFUAFQAA&q=museumbarberini%20since%3A2019-12-01%20until%3A2020-03-05&l="
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/twitterscraper/query.py", line 99, in query_single_page
    json_resp = response.json()
  File "/usr/local/lib/python3.6/dist-packages/requests/models.py", line 898, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
INFO: Got 358 tweets for museumbarberini%20since%3A2017-01-26%20until%3A2017-04-30.
gabri985 commented 4 years ago

What is the original request? Have you set any limit for tweets to be retrieved?

LinqLover commented 4 years ago
ts.query_tweets("museumbarberini", begindate=dt.date(2015, 1, 1))
gabri985 commented 4 years ago

Ok, I was experiencing a similar issue. In my case, I'm asking for all tweets in the past two weeks for a set of 10 different hashtags and after a while, the server is responding with a 429 TOO_MANY_REQUESTS. For those kinds of responses, the body sent from Twitter is a page basically telling you "oops, we are slowing down you!", which has an unknown format for the scraper and this causes the decoding error (since the scraper does not check for HTTP response statuses, I could not understand immediately the root cause of the problem).

I don't know if this can be the same issue, anyway I had to debug the code to confirm that.

SpaceCadetSkywalker commented 4 years ago

same here, the very first line of response is ERROR: <Response [429]>

to check that i have added logger.exception("{}".format(response)) just after line logger.exception('Failed to parse JSON "{}" while requesting "{}"'.format(e, url)) in query.py

a dirty-dirty workaround was for me to add

    if retry == 45:
        logger.info("RETRY: {}".format(str(retry)))
        logger.info("SLEEPING")
        time.sleep(360)

to the beginning of the query_single_page function.

But this is a very-very ugly and dirty temporary not-a-fix

also random.shuffle(proxies) seems to help for this error (or I am being delusional here) ;)

LinqLover commented 4 years ago

Wow, it seems as we are missing a request.raise_for_status() just here. If I understand the code correctly (which appears to be "slightly" abusive in terms of recursion), the right way could be:

        response = requests.get(url, headers=HEADER, proxies={"http": proxy}, timeout=timeout)
        if response.status == 429:
            return query_single_page(query, lang, pos, retry - 1, from_user)
        response.raise_for_status()
EthanZeigler commented 4 years ago

Is there any plans to fix this? I'm running into it an awful lot recently.

asif-faizan commented 4 years ago

can someone share his/her code i am new to python i am in tool based scraping but want twitter data for research work

EthanZeigler commented 4 years ago

I have a working fork. It sleeps for 5 minutes once the rate limit errors start and resumes afterwords. Worked perfectly for me.

https://github.com/EthanZeigler/twitterscraper

To pull using pip, search up something like pip install from git. Should find a solution.

lapp0 commented 4 years ago

@EthanZeigler could you make a PR?

EthanZeigler commented 4 years ago

Yup. Just didn't want to change default behavior without a CLI option to change it. Got sidetracked and forgot about it.