joelgrus / data-science-from-scratch

code for Data Science From Scratch book
MIT License
8.63k stars 4.5k forks source link

HTTP ERROR: 403 when using MyStreamer (page 121) #113

Open SebastianChk opened 2 years ago

SebastianChk commented 2 years ago

I am trying to run the code on page 121 (with a modification that I had to make, see my pull request for details), having already authenticated and saved all my tokens/keys and assigned them to CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET

But when I run

from twython import TwythonStreamer

# Appending data to a global variable is pretty poor form, but it makes
# the example much simpler
tweets = []

class MyStreamer(TwythonStreamer):
    def on_success(self, data):
        """
        What do we do when Twitter sends us data?
        Here data will be a Python dict representing a tweet.
        """
        # We only want to collect English-language tweets
        if data.get('lang') == 'en':
            tweets.append(data)
            print(f"received tweet #{len(tweets)}")
        # Stop when we've collected enough
        if len(tweets) >= 100:
            self.disconnect()
    def on_error(self, status_code, data, headers=None):
        print(status_code, data, headers)
        self.disconnect()

stream = MyStreamer(CONSUMER_KEY, CONSUMER_SECRET,
                    ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

# starts consuming public statuses that contain the keyword 'data'
stream.statuses.filter(track='data')

# if instead we wanted to start consuming a sample of *all* public statuses
# stream.statuses.sample()

top_hashtags = Counter(hashtag['text'].lower()
                        for tweet in tweets
                        for hashtag in tweet["entities"]["hashtags"])
print(top_hashtags.most_common(5))

it seems that on_error is called and therefore the following is the result of print(status_code, '\n\n', data, '\n\n', headers)


 b'<html>\\n<head>\\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>\\n<title>Error 403 \nPlease use V2 filtered and sample volume stream as alternatives\n</title>\n</head>\n<body>\n<h2>HTTP ERROR: 403</h2>\n<p>Problem accessing \'/1.1/statuses/filter.json\'. Reason:\n<pre>    \nPlease use V2 filtered and sample volume stream as alternatives\n</pre>\n</body>\n</html>\n' 

 {'date': 'Fri, 26 Aug 2022 09:29:27 UTC', 'server': 'tsa_f', 'set-cookie': 'guest_id=v1%3A166150616754703495; Max-Age=34214400; Expires=Tue, 26 Sep 2023 09:29:27 GMT; Path=/; Domain=.twitter.com; Secure; SameSite=None', 'content-type': 'text/html', 'cache-control': 'must-revalidate,no-cache,no-store', 'x-xss-protection': '0', 'strict-transport-security': 'max-age=631138519', 'x-response-time': '382', 'x-connection-hash': '9343d4017f3bb45ab2633cdbc970b45cf4477092b563c2ef2789af7cbe9b99d3', 'transfer-encoding': 'chunked'}
[]

but print(tweets) returns [], so tweets is not appended to. I am new to this stuff, but it seems like this error is telling me that I am not using the correct API or I don't have the correct permissions roughly speaking, or something like that. Checking my developer account, I do indeed have "elevated" access, so I think this should be working.