Closed yuenshingyan closed 3 years ago
Hi, is it repeatable? Could you send me code? Sometimes I have this problem at Github Actions. Then I repeat task. Where do you run script? On local machine? or azure or AWS?
Hi, is it repeatable? Could you send me code? Sometimes I have this problem at Github Actions. Then I repeat task. Where do you run script? On local machine? or azure or AWS?
I run it on my jupyter notebook. I was actually trying to run two notebook simultaneously to speed things up.
def mine_tweets(start_date_str, no_days, keywords, tweets_limit, username): dates = pd.date_range(start_date_str, periods=no_days, freq='D') for date in dates.strftime('%Y-%m-%d'): search_tweets_task = st.SearchTweetsTask( from_username=username, any_word=keywords, tweets_limit=tweets_limit, since=arrow.get(f'{date}T00:00:00.000+01:00'), until=arrow.get(f'{date}T24:00:00.000+01:00'), language=st.Language.ENGLISH)
tweets_collector = st.CollectorTweetOutput()
PrintTweetOutput = st.PrintTweetOutput()
st.TweetSearchRunner(
search_tweets_task=search_tweets_task,
tweet_outputs=[tweets_collector, st.CsvTweetOutput(f'{username}_{keywords}_output_{date}.csv')]
).run()
tweets = tweets_collector.get_scrapped_tweets()
df_list = []
for date in dates.strftime('%Y-%m-%d'):
df_list.append(pd.read_csv(f'{username}_{keywords}_output_{date}.csv'))
mined = pd.concat(df_list)
mined.to_csv(r'C:\Users\Hindy\Desktop\Jupyter\Tweets\{}_{}_output_{}.csv'.format(username, keywords, start_date_str))
for user in usernames: mine_tweets('2016-01-01', 365*5, None, 50000000, user)
Please reformat code, I can't analyze it.
usernames = ['cnnbrk', 'CNN', 'CNNnewsroom', 'cnni', 'cnnphilippines', 'CNNAfrica', 'BBCWorld', 'cnnasiapr', 'cnnphlife',
'CNNnews18','FoxNews', 'FOXTV', 'FoxLifeIndia', 'ABC', 'SkyNews', 'SCMPNews', 'ABCPolitics', 'CBSNews',
'CBCNews', 'ABSCBNNews', 'itvnews', 'NYDailyNews', 'gmanews', 'SkyNewsBreak', 'NBCNews', 'ANI', 'OANN',
'MTVNEWS', '10NewsFirst', '7NewsMelbourne', 'dallasnews', 'YahooNews', 'abcnews', 'VICENews', 'YonhapNews',
'DDNewslive', 'ABCWorldNews', '9NewsAUS', 'elonmusk', 'TheEconomist',
]
def mine_tweets(start_date_str, no_days, keywords, tweets_limit, username):
dates = pd.date_range(start_date_str, periods=no_days, freq='D')
for date in dates.strftime('%Y-%m-%d'):
search_tweets_task = st.SearchTweetsTask(
from_username=username,
any_word=keywords,
tweets_limit=tweets_limit,
since=arrow.get(f'{date}T00:00:00.000+01:00'),
until=arrow.get(f'{date}T24:00:00.000+01:00'),
language=st.Language.ENGLISH)
tweets_collector = st.CollectorTweetOutput()
PrintTweetOutput = st.PrintTweetOutput()
st.TweetSearchRunner(
search_tweets_task=search_tweets_task,
tweet_outputs=[tweets_collector, st.CsvTweetOutput(f'{username}_{keywords}_output_{date}.csv')]
).run()
tweets = tweets_collector.get_scrapped_tweets()
df_list = []
for date in dates.strftime('%Y-%m-%d'):
df_list.append(pd.read_csv(f'{username}_{keywords}_output_{date}.csv'))
mined = pd.concat(df_list)
mined.to_csv(r'C:\Users\Hindy\Desktop\Jupyter\Tweets\{}_{}_output_{}.csv'.format(username, keywords, start_date_str))
for user in usernames:
mine_tweets('2016-01-01', 365*5, None, 50000000, user)
---------------------------------------------------------------------------
RefreshTokenException Traceback (most recent call last)
<ipython-input-11-a36795e61fa1> in <module>
1 for user in usernames:
----> 2 mine_tweets('2016-01-01', 365*5, 'oil', 50000000, user)
<ipython-input-9-0da9593c083e> in mine_tweets(start_date_str, no_days, keywords, tweets_limit, username)
15 st.TweetSearchRunner(
16 search_tweets_task=search_tweets_task,
---> 17 tweet_outputs=[tweets_collector, st.CsvTweetOutput(f'{username}_{keywords}_output_{date}.csv')]
18 ).run()
19
F:\Anaconda\lib\site-packages\stweet\search_runner\search_runner.py in run(self)
48 def run(self) -> SearchTweetsResult:
49 """Main search_runner method."""
---> 50 self._prepare_token()
51 while not self._is_end_of_scrapping():
52 self._execute_next_tweets_request()
F:\Anaconda\lib\site-packages\stweet\search_runner\search_runner.py in _prepare_token(self)
88 def _prepare_token(self):
89 if self.search_run_context.guest_auth_token is None:
---> 90 self._refresh_token()
91 return
92
F:\Anaconda\lib\site-packages\stweet\search_runner\search_runner.py in _refresh_token(self)
83 def _refresh_token(self):
84 token_provider = self.auth_token_provider_factory.create(self.web_client)
---> 85 self.search_run_context.guest_auth_token = token_provider.get_new_token()
86 return
87
F:\Anaconda\lib\site-packages\stweet\auth\simple_auth_token_provider.py in get_new_token(self)
38 """Method to get refreshed token. In case of error raise RefreshTokenException."""
39 try:
---> 40 token_html = self._request_for_response_body()
41 return json.loads(token_html)['guest_token']
42 except JSONDecodeError:
F:\Anaconda\lib\site-packages\stweet\auth\simple_auth_token_provider.py in _request_for_response_body(self)
33 return token_response.text
34 else:
---> 35 raise RefreshTokenException('Error during request for token')
36
37 def get_new_token(self) -> str:
RefreshTokenException: Error during request for token
Your code works correct in my environment. Maybe Twitter block your request (just like at Github Actions). Please check your these users tweets in browser. I will prepare curl requests now to check exactly responses for my requests.
Your code works correct in my environment. Maybe Twitter block your request (just like at Github Actions). Please check your these users tweets in browser. I will prepare curl requests now to check exactly responses for my requests.
My tweets works fine.
Actually stweet worked again 10 mins after this post but it also stopped again due to the same error.
When You will have this error please send me response of this shell command:
curl --location --request POST 'https://api.twitter.com/1.1/guest/activate.json' \
--header 'Authorization: Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA'
I think that you have something with your env. Maybe then other request should go – if you have this error you can help me to fix bug 😊 (if it is possible od course)
When You will have this error please send me response of this shell command:
curl --location --request POST 'https://api.twitter.com/1.1/guest/activate.json' \ --header 'Authorization: Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejRCOuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu4FA33AGWWjCpTnA'
I think that you have something with your env. Maybe then other request should go – if you have this error you can help me to fix bug 😊 (if it is possible od course)
The same error just happened.
Can you please explain a little more? since I don't really have much experience with programming.
what is curl?
Curl is a simple http client with command line interface. You can install Curl on windows. Alternatively you can run request in other http client like Postman.
Curl is a simple http client with command line interface. You can install Curl on windows. Alternatively you can run request in other http client like Postman.
Just run the shell command and it gives me a guest_token. What should I do next?
Ok, the token will be returned exactly after library crash (with RefreshTokenException
)?
If it is true, it means that I need to prepare library with WebClient wrapped with interceptor – then I can check this error occurrences. See task #48 – I will prepare this as soon as possible (I suspect today)
I have finished the PR, new version is being released
Ok, new version v1.3.0 is released. Please update your current version.
import arrow
import pandas as pd
import stweet as st
from stweet.http_request import RequestDetails
from stweet.http_request.interceptor.logging_requests_web_client_interceptor import \
LoggingRequestsWebClientInterceptor
class AuthLoggingInterceptor(LoggingRequestsWebClientInterceptor):
def logs_to_show(self, params: RequestDetails) -> bool:
return params.url == 'https://api.twitter.com/1.1/guest/activate.json'
usernames = [
'cnnbrk', 'CNN', 'CNNnewsroom', 'cnni', 'cnnphilippines', 'CNNAfrica', 'BBCWorld', 'cnnasiapr', 'cnnphlife',
'CNNnews18', 'FoxNews', 'FOXTV', 'FoxLifeIndia', 'ABC', 'SkyNews', 'SCMPNews', 'ABCPolitics', 'CBSNews',
'CBCNews', 'ABSCBNNews', 'itvnews', 'NYDailyNews', 'gmanews', 'SkyNewsBreak', 'NBCNews', 'ANI', 'OANN',
'MTVNEWS', '10NewsFirst', '7NewsMelbourne', 'dallasnews', 'YahooNews', 'abcnews', 'VICENews', 'YonhapNews',
'DDNewslive', 'ABCWorldNews', '9NewsAUS', 'elonmusk', 'TheEconomist',
]
def mine_tweets(start_date_str, no_days, keywords, tweets_limit, username):
dates = pd.date_range(start_date_str, periods=no_days, freq='D')
auth_logging_interceptor = AuthLoggingInterceptor()
for date in dates.strftime('%Y-%m-%d'):
search_tweets_task = st.SearchTweetsTask(
from_username=username,
any_word=keywords,
tweets_limit=tweets_limit,
since=arrow.get(f'{date}T00:00:00.000+01:00'),
until=arrow.get(f'{date}T24:00:00.000+01:00'),
language=st.Language.ENGLISH)
tweets_collector = st.CollectorTweetOutput()
st.TweetSearchRunner(
search_tweets_task=search_tweets_task,
tweet_outputs=[tweets_collector, st.CsvTweetOutput(f'{username}_{keywords}_output_{date}.csv')],
web_client=st.RequestsWebClient(interceptors=[auth_logging_interceptor])
).run()
df_list = []
for date in dates.strftime('%Y-%m-%d'):
df_list.append(pd.read_csv(f'{username}_{keywords}_output_{date}.csv'))
mined = pd.concat(df_list)
mined.to_csv(
r'C:\Users\Hindy\Desktop\Jupyter\Tweets\{}_{}_output_{}.csv'.format(username, keywords, start_date_str))
for user in usernames:
mine_tweets('2016-01-01', 365 * 5, None, 50000000, user)
Please run this code and wait for error. Next put here result logs.
I have reproduced this error in Github Actions – in my case I have 429 response code. It means there was too many requests. I will try to find solution for this problem – of course asap.
Please confirm you have similar problem.
I have prepared complete change, it was difficult to do it good with high quality. Library architecture was little changed, I need to update documentation, finish other little tasks and the new version will be released 😉
I found also a reason of your bug – every new request run call for new token api, there are limitations to this call, there is a need to make less requests – modification need to use the same WebClient. I will write it in docs update.
I found also a reason of your bug – every new request run call for new token api, there are limitations to this call, there is a need to make less requests – modification need to use the same WebClient. I will write it in docs update.
Maybe wait for few seconds between loops could solve this?
Not exactly, I think that request for guest token is limited. With current config every new task require call for new guest token at start. Now I'm testing request with modified configuration. If it will be ok, I will prepare code for current released version to run your task 😉
Nice:)
「Marcin Wątroba notifications@github.com」在 2021年2月20日 週六,下午5:31 寫道:
Not exactly, I think that request for guest token is limited. With current config every new task require call for new guest token at start. Now I'm testing request with modified configuration. If it will be ok, I will prepare code for current released version to run your task 😉
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/markowanga/stweet/issues/47#issuecomment-782594842, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQPFMRGI5MALSGH6ZUYJ6ITS756OZANCNFSM4XUV45QQ .
You need to read and set auth token in context, you need to change paths
from datetime import datetime
import arrow
import pandas as pd
import stweet as st
from stweet.http_request import RequestDetails
from stweet.http_request.interceptor.logging_requests_web_client_interceptor import \
LoggingRequestsWebClientInterceptor
class AuthLoggingInterceptor(LoggingRequestsWebClientInterceptor):
def logs_to_show(self, params: RequestDetails) -> bool:
return params.url == 'https://api.twitter.com/1.1/guest/activate.json'
usernames = [
'cnnbrk', 'CNN', 'CNNnewsroom', 'cnni', 'cnnphilippines', 'CNNAfrica', 'BBCWorld', 'cnnasiapr', 'cnnphlife',
'CNNnews18', 'FoxNews', 'FOXTV', 'FoxLifeIndia', 'ABC', 'SkyNews', 'SCMPNews', 'ABCPolitics', 'CBSNews',
'CBCNews', 'ABSCBNNews', 'itvnews', 'NYDailyNews', 'gmanews', 'SkyNewsBreak', 'NBCNews', 'ANI', 'OANN',
'MTVNEWS', '10NewsFirst', '7NewsMelbourne', 'dallasnews', 'YahooNews', 'abcnews', 'VICENews', 'YonhapNews',
'DDNewslive', 'ABCWorldNews', '9NewsAUS', 'elonmusk', 'TheEconomist',
]
def mine_tweets(start_date_str, no_days, keywords, tweets_limit, username):
dates = pd.date_range(start_date_str, periods=no_days, freq='D')
auth_token = None
for date in dates.strftime('%Y-%m-%d'):
search_tweets_task = st.SearchTweetsTask(
from_username=username,
any_word=keywords,
tweets_limit=tweets_limit,
since=arrow.get(f'{date}T00:00:00.000+01:00'),
until=arrow.get(f'{date}T24:00:00.000+01:00'),
language=st.Language.ENGLISH)
tweets_collector = st.CollectorTweetOutput()
runner = st.TweetSearchRunner(
search_tweets_task=search_tweets_task,
tweet_outputs=[tweets_collector, st.CsvTweetOutput(f'dir/{username}_{keywords}_output_{date}.csv')],
search_run_context=st.search_runner.SearchRunContext(guest_auth_token=auth_token),
web_client=st.RequestsWebClient(interceptors=[AuthLoggingInterceptor()])
)
runner.run()
auth_token = runner.search_run_context.guest_auth_token
df_list = []
for date in dates.strftime('%Y-%m-%d'):
df_list.append(pd.read_csv(f'dir/{username}_{keywords}_output_{date}.csv'))
mined = pd.concat(df_list)
mined.to_csv(
r'/Users/marcinwatroba/Desktop/WUST/intent-generator/dir/{}_{}_output_{}.csv'.format(
username, keywords, start_date_str))
if __name__ == '__main__':
for user in usernames:
mine_tweets('2016-01-01', 365 * 5, None, 50000000, user)
just upgraded the lib and tried the code above, and this happend:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-6-54dd3a218489> in <module>
6 import stweet as st
7 from stweet.http_request import RequestDetails
----> 8 from stweet.http_request.interceptor.logging_requests_web_client_interceptor import LoggingRequestsWebClientInterceptor
9
10
F:\Anaconda\lib\site-packages\stweet\http_request\interceptor\logging_requests_web_client_interceptor.py in <module>
7
8
----> 9 class LoggingRequestsWebClientInterceptor(WebClient.WebClientInterceptor):
10 """Class of LoggingRequestsWebClientInterceptor."""
11
AttributeError: type object 'WebClient' has no attribute 'WebClientInterceptor'
Have you stweet
in version 1.3.0
?
If not run pip install -U stweet
Have you
stweet
in version1.3.0
? If not runpip install -U stweet
Yeah it's in version 1.3.0, but it popped the error above.
test_stweet.zip
If you using docker please check this – for me everything works perfect 😊 (if docker is installed please run only ./run.sh
in package extraction directory)
I think that you have something wrong with your environment
Is that help you?
I know nothing about docker. I ran the code on jupyter note book.
「Marcin Wątroba notifications@github.com」在 2021年2月22日 週一,上午4:45 寫道:
Is that help you?
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/markowanga/stweet/issues/47#issuecomment-782923418, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQPFMRAO36TJ22SXKQ5TRK3TAFWE7ANCNFSM4XUV45QQ .
Docker can build completely separate environment and run project. I've prepared image which install all important dependencies and run the program. It uses special volumes so files with results will be stored in your machine. It works correct.
Imo I think that you have not installed stweet with version 1.3.0 – the class which raise error was added from 1.3.0. Please check your virtual environments.
Hi @markowanga How to use with aws? I want to use this on ec2.
I tried curl and got this error.
"errors":[{"code":88,"message":"Rate limit exceeded."}]
The usage limit has not been reached.
Hi, please move it to new issue 😉
RefreshTokenException Traceback (most recent call last)