markowanga / stweet

Advanced python library to scrap Twitter (tweets, users) from unofficial API
MIT License
581 stars 67 forks source link

RefreshTokenException: Error during request for token #47

Closed yuenshingyan closed 3 years ago

yuenshingyan commented 3 years ago

RefreshTokenException Traceback (most recent call last)

in 1 for user in usernames: ----> 2 mine_tweets('2016-01-01', 365*5, None, 50000000, user) in mine_tweets(start_date_str, no_days, keywords, tweets_limit, username) 14 st.TweetSearchRunner( 15 search_tweets_task=search_tweets_task, ---> 16 tweet_outputs=[tweets_collector, st.CsvTweetOutput(f'{username}_{keywords}_output_{date}.csv')] 17 ).run() 18 F:\Anaconda\lib\site-packages\stweet\search_runner\search_runner.py in run(self) 48 def run(self) -> SearchTweetsResult: 49 """Main search_runner method.""" ---> 50 self._prepare_token() 51 while not self._is_end_of_scrapping(): 52 self._execute_next_tweets_request() F:\Anaconda\lib\site-packages\stweet\search_runner\search_runner.py in _prepare_token(self) 88 def _prepare_token(self): 89 if self.search_run_context.guest_auth_token is None: ---> 90 self._refresh_token() 91 return 92 F:\Anaconda\lib\site-packages\stweet\search_runner\search_runner.py in _refresh_token(self) 83 def _refresh_token(self): 84 token_provider = self.auth_token_provider_factory.create(self.web_client) ---> 85 self.search_run_context.guest_auth_token = token_provider.get_new_token() 86 return 87 F:\Anaconda\lib\site-packages\stweet\auth\simple_auth_token_provider.py in get_new_token(self) 38 """Method to get refreshed token. In case of error raise RefreshTokenException.""" 39 try: ---> 40 token_html = self._request_for_response_body() 41 return json.loads(token_html)['guest_token'] 42 except JSONDecodeError: F:\Anaconda\lib\site-packages\stweet\auth\simple_auth_token_provider.py in _request_for_response_body(self) 33 return token_response.text 34 else: ---> 35 raise RefreshTokenException('Error during request for token') 36 37 def get_new_token(self) -> str: RefreshTokenException: Error during request for token
markowanga commented 3 years ago

Hi, is it repeatable? Could you send me code? Sometimes I have this problem at Github Actions. Then I repeat task. Where do you run script? On local machine? or azure or AWS?

yuenshingyan commented 3 years ago

Hi, is it repeatable? Could you send me code? Sometimes I have this problem at Github Actions. Then I repeat task. Where do you run script? On local machine? or azure or AWS?

I run it on my jupyter notebook. I was actually trying to run two notebook simultaneously to speed things up.

def mine_tweets(start_date_str, no_days, keywords, tweets_limit, username): dates = pd.date_range(start_date_str, periods=no_days, freq='D') for date in dates.strftime('%Y-%m-%d'): search_tweets_task = st.SearchTweetsTask( from_username=username, any_word=keywords, tweets_limit=tweets_limit, since=arrow.get(f'{date}T00:00:00.000+01:00'), until=arrow.get(f'{date}T24:00:00.000+01:00'), language=st.Language.ENGLISH)

    tweets_collector = st.CollectorTweetOutput()
    PrintTweetOutput = st.PrintTweetOutput()
    st.TweetSearchRunner(
        search_tweets_task=search_tweets_task,
        tweet_outputs=[tweets_collector, st.CsvTweetOutput(f'{username}_{keywords}_output_{date}.csv')]
    ).run()

    tweets = tweets_collector.get_scrapped_tweets()

df_list = []
for date in dates.strftime('%Y-%m-%d'):
    df_list.append(pd.read_csv(f'{username}_{keywords}_output_{date}.csv'))

mined = pd.concat(df_list)
mined.to_csv(r'C:\Users\Hindy\Desktop\Jupyter\Tweets\{}_{}_output_{}.csv'.format(username, keywords, start_date_str))

for user in usernames: mine_tweets('2016-01-01', 365*5, None, 50000000, user)

markowanga commented 3 years ago

Please reformat code, I can't analyze it.

yuenshingyan commented 3 years ago
usernames = ['cnnbrk', 'CNN', 'CNNnewsroom', 'cnni', 'cnnphilippines', 'CNNAfrica', 'BBCWorld', 'cnnasiapr', 'cnnphlife', 
             'CNNnews18','FoxNews', 'FOXTV', 'FoxLifeIndia', 'ABC', 'SkyNews', 'SCMPNews', 'ABCPolitics', 'CBSNews', 
             'CBCNews', 'ABSCBNNews', 'itvnews', 'NYDailyNews', 'gmanews', 'SkyNewsBreak', 'NBCNews', 'ANI', 'OANN',
             'MTVNEWS', '10NewsFirst', '7NewsMelbourne', 'dallasnews', 'YahooNews', 'abcnews', 'VICENews', 'YonhapNews',
             'DDNewslive', 'ABCWorldNews', '9NewsAUS', 'elonmusk', 'TheEconomist', 
             ]

def mine_tweets(start_date_str, no_days, keywords, tweets_limit, username):
    dates = pd.date_range(start_date_str, periods=no_days, freq='D')
    for date in dates.strftime('%Y-%m-%d'):
        search_tweets_task = st.SearchTweetsTask(
            from_username=username,
            any_word=keywords, 
            tweets_limit=tweets_limit,
            since=arrow.get(f'{date}T00:00:00.000+01:00'),
            until=arrow.get(f'{date}T24:00:00.000+01:00'),
            language=st.Language.ENGLISH)

        tweets_collector = st.CollectorTweetOutput()
        PrintTweetOutput = st.PrintTweetOutput()
        st.TweetSearchRunner(
            search_tweets_task=search_tweets_task,
            tweet_outputs=[tweets_collector, st.CsvTweetOutput(f'{username}_{keywords}_output_{date}.csv')]
        ).run()

        tweets = tweets_collector.get_scrapped_tweets()

    df_list = []
    for date in dates.strftime('%Y-%m-%d'):
        df_list.append(pd.read_csv(f'{username}_{keywords}_output_{date}.csv'))

    mined = pd.concat(df_list)
    mined.to_csv(r'C:\Users\Hindy\Desktop\Jupyter\Tweets\{}_{}_output_{}.csv'.format(username, keywords, start_date_str))
yuenshingyan commented 3 years ago
for user in usernames:
    mine_tweets('2016-01-01', 365*5, None, 50000000, user)
yuenshingyan commented 3 years ago
---------------------------------------------------------------------------
RefreshTokenException                     Traceback (most recent call last)
<ipython-input-11-a36795e61fa1> in <module>
      1 for user in usernames:
----> 2     mine_tweets('2016-01-01', 365*5, 'oil', 50000000, user)

<ipython-input-9-0da9593c083e> in mine_tweets(start_date_str, no_days, keywords, tweets_limit, username)
     15         st.TweetSearchRunner(
     16             search_tweets_task=search_tweets_task,
---> 17             tweet_outputs=[tweets_collector, st.CsvTweetOutput(f'{username}_{keywords}_output_{date}.csv')]
     18         ).run()
     19 

F:\Anaconda\lib\site-packages\stweet\search_runner\search_runner.py in run(self)
     48     def run(self) -> SearchTweetsResult:
     49         """Main search_runner method."""
---> 50         self._prepare_token()
     51         while not self._is_end_of_scrapping():
     52             self._execute_next_tweets_request()

F:\Anaconda\lib\site-packages\stweet\search_runner\search_runner.py in _prepare_token(self)
     88     def _prepare_token(self):
     89         if self.search_run_context.guest_auth_token is None:
---> 90             self._refresh_token()
     91         return
     92 

F:\Anaconda\lib\site-packages\stweet\search_runner\search_runner.py in _refresh_token(self)
     83     def _refresh_token(self):
     84         token_provider = self.auth_token_provider_factory.create(self.web_client)
---> 85         self.search_run_context.guest_auth_token = token_provider.get_new_token()
     86         return
     87 

F:\Anaconda\lib\site-packages\stweet\auth\simple_auth_token_provider.py in get_new_token(self)
     38         """Method to get refreshed token. In case of error raise RefreshTokenException."""
     39         try:
---> 40             token_html = self._request_for_response_body()
     41             return json.loads(token_html)['guest_token']
     42         except JSONDecodeError:

F:\Anaconda\lib\site-packages\stweet\auth\simple_auth_token_provider.py in _request_for_response_body(self)
     33             return token_response.text
     34         else:
---> 35             raise RefreshTokenException('Error during request for token')
     36 
     37     def get_new_token(self) -> str:

RefreshTokenException: Error during request for token
markowanga commented 3 years ago

Your code works correct in my environment. Maybe Twitter block your request (just like at Github Actions). Please check your these users tweets in browser. I will prepare curl requests now to check exactly responses for my requests.

yuenshingyan commented 3 years ago

Your code works correct in my environment. Maybe Twitter block your request (just like at Github Actions). Please check your these users tweets in browser. I will prepare curl requests now to check exactly responses for my requests.

My tweets works fine.

Actually stweet worked again 10 mins after this post but it also stopped again due to the same error.

markowanga commented 3 years ago

When You will have this error please send me response of this shell command:

curl --location --request POST 'https://api.twitter.com/1.1/guest/activate.json' \
--header 'Authorization: Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA'

I think that you have something with your env. Maybe then other request should go – if you have this error you can help me to fix bug 😊 (if it is possible od course)

yuenshingyan commented 3 years ago

When You will have this error please send me response of this shell command:

curl --location --request POST 'https://api.twitter.com/1.1/guest/activate.json' \
--header 'Authorization: Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA'

I think that you have something with your env. Maybe then other request should go – if you have this error you can help me to fix bug 😊 (if it is possible od course)

The same error just happened.

Can you please explain a little more? since I don't really have much experience with programming.

what is curl?

markowanga commented 3 years ago

Curl is a simple http client with command line interface. You can install Curl on windows. Alternatively you can run request in other http client like Postman.

yuenshingyan commented 3 years ago

Curl is a simple http client with command line interface. You can install Curl on windows. Alternatively you can run request in other http client like Postman.

Just run the shell command and it gives me a guest_token. What should I do next?

markowanga commented 3 years ago

Ok, the token will be returned exactly after library crash (with RefreshTokenException)? If it is true, it means that I need to prepare library with WebClient wrapped with interceptor – then I can check this error occurrences. See task #48 – I will prepare this as soon as possible (I suspect today)

markowanga commented 3 years ago

I have finished the PR, new version is being released

markowanga commented 3 years ago

Ok, new version v1.3.0 is released. Please update your current version.

import arrow
import pandas as pd

import stweet as st
from stweet.http_request import RequestDetails
from stweet.http_request.interceptor.logging_requests_web_client_interceptor import \
    LoggingRequestsWebClientInterceptor

class AuthLoggingInterceptor(LoggingRequestsWebClientInterceptor):
    def logs_to_show(self, params: RequestDetails) -> bool:
        return params.url == 'https://api.twitter.com/1.1/guest/activate.json'

usernames = [
    'cnnbrk', 'CNN', 'CNNnewsroom', 'cnni', 'cnnphilippines', 'CNNAfrica', 'BBCWorld', 'cnnasiapr', 'cnnphlife',
    'CNNnews18', 'FoxNews', 'FOXTV', 'FoxLifeIndia', 'ABC', 'SkyNews', 'SCMPNews', 'ABCPolitics', 'CBSNews',
    'CBCNews', 'ABSCBNNews', 'itvnews', 'NYDailyNews', 'gmanews', 'SkyNewsBreak', 'NBCNews', 'ANI', 'OANN',
    'MTVNEWS', '10NewsFirst', '7NewsMelbourne', 'dallasnews', 'YahooNews', 'abcnews', 'VICENews', 'YonhapNews',
    'DDNewslive', 'ABCWorldNews', '9NewsAUS', 'elonmusk', 'TheEconomist',
]

def mine_tweets(start_date_str, no_days, keywords, tweets_limit, username):
    dates = pd.date_range(start_date_str, periods=no_days, freq='D')
    auth_logging_interceptor = AuthLoggingInterceptor()
    for date in dates.strftime('%Y-%m-%d'):
        search_tweets_task = st.SearchTweetsTask(
            from_username=username,
            any_word=keywords,
            tweets_limit=tweets_limit,
            since=arrow.get(f'{date}T00:00:00.000+01:00'),
            until=arrow.get(f'{date}T24:00:00.000+01:00'),
            language=st.Language.ENGLISH)

        tweets_collector = st.CollectorTweetOutput()
        st.TweetSearchRunner(
            search_tweets_task=search_tweets_task,
            tweet_outputs=[tweets_collector, st.CsvTweetOutput(f'{username}_{keywords}_output_{date}.csv')],
            web_client=st.RequestsWebClient(interceptors=[auth_logging_interceptor])
        ).run()

    df_list = []
    for date in dates.strftime('%Y-%m-%d'):
        df_list.append(pd.read_csv(f'{username}_{keywords}_output_{date}.csv'))

    mined = pd.concat(df_list)
    mined.to_csv(
        r'C:\Users\Hindy\Desktop\Jupyter\Tweets\{}_{}_output_{}.csv'.format(username, keywords, start_date_str))

for user in usernames:
    mine_tweets('2016-01-01', 365 * 5, None, 50000000, user)

Please run this code and wait for error. Next put here result logs.

markowanga commented 3 years ago

I have reproduced this error in Github Actions – in my case I have 429 response code. It means there was too many requests. I will try to find solution for this problem – of course asap.

Please confirm you have similar problem.

markowanga commented 3 years ago

I have prepared complete change, it was difficult to do it good with high quality. Library architecture was little changed, I need to update documentation, finish other little tasks and the new version will be released 😉

markowanga commented 3 years ago

I found also a reason of your bug – every new request run call for new token api, there are limitations to this call, there is a need to make less requests – modification need to use the same WebClient. I will write it in docs update.

yuenshingyan commented 3 years ago

I found also a reason of your bug – every new request run call for new token api, there are limitations to this call, there is a need to make less requests – modification need to use the same WebClient. I will write it in docs update.

Maybe wait for few seconds between loops could solve this?

markowanga commented 3 years ago

Not exactly, I think that request for guest token is limited. With current config every new task require call for new guest token at start. Now I'm testing request with modified configuration. If it will be ok, I will prepare code for current released version to run your task 😉

yuenshingyan commented 3 years ago

Nice:)

「Marcin Wątroba notifications@github.com」在 2021年2月20日 週六,下午5:31 寫道:

Not exactly, I think that request for guest token is limited. With current config every new task require call for new guest token at start. Now I'm testing request with modified configuration. If it will be ok, I will prepare code for current released version to run your task 😉

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/markowanga/stweet/issues/47#issuecomment-782594842, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQPFMRGI5MALSGH6ZUYJ6ITS756OZANCNFSM4XUV45QQ .

markowanga commented 3 years ago

You need to read and set auth token in context, you need to change paths

from datetime import datetime

import arrow
import pandas as pd

import stweet as st
from stweet.http_request import RequestDetails
from stweet.http_request.interceptor.logging_requests_web_client_interceptor import \
    LoggingRequestsWebClientInterceptor

class AuthLoggingInterceptor(LoggingRequestsWebClientInterceptor):
    def logs_to_show(self, params: RequestDetails) -> bool:
        return params.url == 'https://api.twitter.com/1.1/guest/activate.json'

usernames = [
    'cnnbrk', 'CNN', 'CNNnewsroom', 'cnni', 'cnnphilippines', 'CNNAfrica', 'BBCWorld', 'cnnasiapr', 'cnnphlife',
    'CNNnews18', 'FoxNews', 'FOXTV', 'FoxLifeIndia', 'ABC', 'SkyNews', 'SCMPNews', 'ABCPolitics', 'CBSNews',
    'CBCNews', 'ABSCBNNews', 'itvnews', 'NYDailyNews', 'gmanews', 'SkyNewsBreak', 'NBCNews', 'ANI', 'OANN',
    'MTVNEWS', '10NewsFirst', '7NewsMelbourne', 'dallasnews', 'YahooNews', 'abcnews', 'VICENews', 'YonhapNews',
    'DDNewslive', 'ABCWorldNews', '9NewsAUS', 'elonmusk', 'TheEconomist',
]

def mine_tweets(start_date_str, no_days, keywords, tweets_limit, username):
    dates = pd.date_range(start_date_str, periods=no_days, freq='D')
    auth_token = None
    for date in dates.strftime('%Y-%m-%d'):
        search_tweets_task = st.SearchTweetsTask(
            from_username=username,
            any_word=keywords,
            tweets_limit=tweets_limit,
            since=arrow.get(f'{date}T00:00:00.000+01:00'),
            until=arrow.get(f'{date}T24:00:00.000+01:00'),
            language=st.Language.ENGLISH)

        tweets_collector = st.CollectorTweetOutput()
        runner = st.TweetSearchRunner(
            search_tweets_task=search_tweets_task,
            tweet_outputs=[tweets_collector, st.CsvTweetOutput(f'dir/{username}_{keywords}_output_{date}.csv')],
            search_run_context=st.search_runner.SearchRunContext(guest_auth_token=auth_token),
            web_client=st.RequestsWebClient(interceptors=[AuthLoggingInterceptor()])
        )
        runner.run()
        auth_token = runner.search_run_context.guest_auth_token

    df_list = []
    for date in dates.strftime('%Y-%m-%d'):
        df_list.append(pd.read_csv(f'dir/{username}_{keywords}_output_{date}.csv'))

    mined = pd.concat(df_list)
    mined.to_csv(
        r'/Users/marcinwatroba/Desktop/WUST/intent-generator/dir/{}_{}_output_{}.csv'.format(
            username, keywords, start_date_str))

if __name__ == '__main__':
    for user in usernames:
        mine_tweets('2016-01-01', 365 * 5, None, 50000000, user)
yuenshingyan commented 3 years ago

just upgraded the lib and tried the code above, and this happend:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-54dd3a218489> in <module>
      6 import stweet as st
      7 from stweet.http_request import RequestDetails
----> 8 from stweet.http_request.interceptor.logging_requests_web_client_interceptor import LoggingRequestsWebClientInterceptor
      9 
     10 

F:\Anaconda\lib\site-packages\stweet\http_request\interceptor\logging_requests_web_client_interceptor.py in <module>
      7 
      8 
----> 9 class LoggingRequestsWebClientInterceptor(WebClient.WebClientInterceptor):
     10     """Class of LoggingRequestsWebClientInterceptor."""
     11 

AttributeError: type object 'WebClient' has no attribute 'WebClientInterceptor' 
markowanga commented 3 years ago

Have you stweet in version 1.3.0? If not run pip install -U stweet

yuenshingyan commented 3 years ago

Have you stweet in version 1.3.0? If not run pip install -U stweet

Yeah it's in version 1.3.0, but it popped the error above.

markowanga commented 3 years ago

test_stweet.zip If you using docker please check this – for me everything works perfect 😊 (if docker is installed please run only ./run.sh in package extraction directory)

I think that you have something wrong with your environment

markowanga commented 3 years ago

Is that help you?

yuenshingyan commented 3 years ago

I know nothing about docker. I ran the code on jupyter note book.

「Marcin Wątroba notifications@github.com」在 2021年2月22日 週一,上午4:45 寫道:

Is that help you?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/markowanga/stweet/issues/47#issuecomment-782923418, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQPFMRAO36TJ22SXKQ5TRK3TAFWE7ANCNFSM4XUV45QQ .

markowanga commented 3 years ago

Docker can build completely separate environment and run project. I've prepared image which install all important dependencies and run the program. It uses special volumes so files with results will be stored in your machine. It works correct.

Imo I think that you have not installed stweet with version 1.3.0 – the class which raise error was added from 1.3.0. Please check your virtual environments.

Eagle-9119 commented 3 years ago

Hi @markowanga How to use with aws? I want to use this on ec2.

I tried curl and got this error. "errors":[{"code":88,"message":"Rate limit exceeded."}] The usage limit has not been reached.

markowanga commented 3 years ago

Hi, please move it to new issue 😉