mratanusarkar / twitter-sentiment-analysis

a demo poc for sentiment analysis of tweets
0 stars 0 forks source link

script failing to scrape when running main #14

Open mratanusarkar opened 1 month ago

mratanusarkar commented 1 month ago

I have a feeling that snscrape.modules.twitter -> sntwitter is down and blocked by twitter!

I tried running the script after the dev setup and pip installations, but I ended up getting the following error(s):

GitHub Codespace

on github codespace, with mcr.microsoft.com/vscode/devcontainers/python:3.9 environment:

@mratanusarkar ➜ /workspaces/twitter-sentiment-analysis/Runner (feature/devcontainer) $ python main.py 
scraping tweets ...
  0%|                                                                                                                                                                                              | 0/10 [00:00<?, ?it/s]Error retrieving https://twitter.com/search?f=live&lang=en&q=%40isro&src=spelling_expansion_revert_click: SSLError(MaxRetryError("HTTPSConnectionPool(host='twitter.com', port=443): Max retries exceeded with url: /search?f=live&lang=en&q=%40isro&src=spelling_expansion_revert_click (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1133)')))"))
4 requests to https://twitter.com/search?f=live&lang=en&q=%40isro&src=spelling_expansion_revert_click failed, giving up.
Errors: SSLError(MaxRetryError("HTTPSConnectionPool(host='twitter.com', port=443): Max retries exceeded with url: /search?f=live&lang=en&q=%40isro&src=spelling_expansion_revert_click (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1133)')))")), SSLError(MaxRetryError("HTTPSConnectionPool(host='twitter.com', port=443): Max retries exceeded with url: /search?f=live&lang=en&q=%40isro&src=spelling_expansion_revert_click (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1133)')))")), SSLError(MaxRetryError("HTTPSConnectionPool(host='twitter.com', port=443): Max retries exceeded with url: /search?f=live&lang=en&q=%40isro&src=spelling_expansion_revert_click (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1133)')))")), SSLError(MaxRetryError("HTTPSConnectionPool(host='twitter.com', port=443): Max retries exceeded with url: /search?f=live&lang=en&q=%40isro&src=spelling_expansion_revert_click (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1133)')))"))
  0%|                                                                                                                                                                                              | 0/10 [00:08<?, ?it/s]
Traceback (most recent call last):
  File "/workspaces/twitter-sentiment-analysis/Runner/module/scraper.py", line 39, in get_tweets
    for tweet in tqdm(twitter_search, total=limit):
  File "/usr/local/lib/python3.9/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/usr/local/lib/python3.9/site-packages/snscrape/modules/twitter.py", line 1763, in get_items
    for obj in self._iter_api_data('https://twitter.com/i/api/graphql/7jT5GT59P8IFjgxwqnEdQw/SearchTimeline', _TwitterAPIType.GRAPHQL, params, paginationParams, cursor = self._cursor, instructionsPath = ['data', 'search_by_raw_query', 'search_timeline', 'timeline', 'instructions']):
  File "/usr/local/lib/python3.9/site-packages/snscrape/modules/twitter.py", line 915, in _iter_api_data
    obj = self._get_api_data(endpoint, apiType, reqParams, instructionsPath = instructionsPath)
  File "/usr/local/lib/python3.9/site-packages/snscrape/modules/twitter.py", line 883, in _get_api_data
    self._ensure_guest_token()
  File "/usr/local/lib/python3.9/site-packages/snscrape/modules/twitter.py", line 825, in _ensure_guest_token
    r = self._get(self._baseUrl if url is None else url, responseOkCallback = self._check_guest_token_response)
  File "/usr/local/lib/python3.9/site-packages/snscrape/base.py", line 275, in _get
    return self._request('GET', *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/snscrape/base.py", line 271, in _request
    raise ScraperException(msg)
snscrape.base.ScraperException: 4 requests to https://twitter.com/search?f=live&lang=en&q=%40isro&src=spelling_expansion_revert_click failed, giving up.
None
processing data & counting words ...
Traceback (most recent call last):
  File "/workspaces/twitter-sentiment-analysis/Runner/main.py", line 16, in <module>
    tweet_wc.generate_word_cloud_v2(rawData, topic_title, exclude_words, 1080, 720)
  File "/workspaces/twitter-sentiment-analysis/Runner/module/generator.py", line 135, in generate_word_cloud_v2
    for tweet_content in tqdm(rawData.content, total=len(rawData.index)):
  File "/usr/local/lib/python3.9/site-packages/pandas/core/generic.py", line 6299, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'content'

Google Colab

on google colab, I tried running Tweet_Analysis_and_Inference.ipynb In the Runner, specifically at plt_fig = generate_word_cloud_v2(rawData, topic_title, exclude_words), I get the following:

ERROR:snscrape.base:Error retrieving https://twitter.com/i/api/graphql/7jT5GT59P8IFjgxwqnEdQw/SearchTimeline?variables=%7B%22rawQuery%22%3A%22ISRO%20OR%20%23SSLVD2%20OR%20%23ISRO%22%2C%22count%22%3A20%2C%22product%22%3A%22Latest%22%2C%22withDownvotePerspective%22%3Afalse%2C%22withReactionsMetadata%22%3Afalse%2C%22withReactionsPerspective%22%3Afalse%7D&features=%7B%22rweb_lists_timeline_redesign_enabled%22%3Afalse%2C%22blue_business_profile_image_shape_enabled%22%3Afalse%2C%22responsive_web_graphql_exclude_directive_enabled%22%3Atrue%2C%22verified_phone_label_enabled%22%3Afalse%2C%22creator_subscriptions_tweet_preview_api_enabled%22%3Afalse%2C%22responsive_web_graphql_timeline_navigation_enabled%22%3Atrue%2C%22responsive_web_graphql_skip_user_profile_image_extensions_enabled%22%3Afalse%2C%22tweetypie_unmention_optimization_enabled%22%3Atrue%2C%22vibe_api_enabled%22%3Atrue%2C%22responsive_web_edit_tweet_api_enabled%22%3Atrue%2C%22graphql_is_translatable_rweb_tweet_is_translatable_enabled%22%3Atrue%2C%22view_counts_everywhere_api_enabled%22%3Atrue%2C%22longform_notetweets_consumption_enabled%22%3Atrue%2C%22tweet_awards_web_tipping_enabled%22%3Afalse%2C%22freedom_of_speech_not_reach_fetch_enabled%22%3Afalse%2C%22standardized_nudges_misinfo%22%3Atrue%2C%22tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled%22%3Afalse%2C%22interactive_text_enabled%22%3Atrue%2C%22responsive_web_text_conversations_enabled%22%3Afalse%2C%22longform_notetweets_rich_text_read_enabled%22%3Afalse%2C%22longform_notetweets_inline_media_enabled%22%3Afalse%2C%22responsive_web_enhance_cards_enabled%22%3Afalse%2C%22responsive_web_twitter_blue_verified_badge_is_enabled%22%3Atrue%7D: blocked (404)
CRITICAL:snscrape.base:4 requests to https://twitter.com/i/api/graphql/7jT5GT59P8IFjgxwqnEdQw/SearchTimeline?variables=%7B%22rawQuery%22%3A%22ISRO%20OR%20%23SSLVD2%20OR%20%23ISRO%22%2C%22count%22%3A20%2C%22product%22%3A%22Latest%22%2C%22withDownvotePerspective%22%3Afalse%2C%22withReactionsMetadata%22%3Afalse%2C%22withReactionsPerspective%22%3Afalse%7D&features=%7B%22rweb_lists_timeline_redesign_enabled%22%3Afalse%2C%22blue_business_profile_image_shape_enabled%22%3Afalse%2C%22responsive_web_graphql_exclude_directive_enabled%22%3Atrue%2C%22verified_phone_label_enabled%22%3Afalse%2C%22creator_subscriptions_tweet_preview_api_enabled%22%3Afalse%2C%22responsive_web_graphql_timeline_navigation_enabled%22%3Atrue%2C%22responsive_web_graphql_skip_user_profile_image_extensions_enabled%22%3Afalse%2C%22tweetypie_unmention_optimization_enabled%22%3Atrue%2C%22vibe_api_enabled%22%3Atrue%2C%22responsive_web_edit_tweet_api_enabled%22%3Atrue%2C%22graphql_is_translatable_rweb_tweet_is_translatable_enabled%22%3Atrue%2C%22view_counts_everywhere_api_enabled%22%3Atrue%2C%22longform_notetweets_consumption_enabled%22%3Atrue%2C%22tweet_awards_web_tipping_enabled%22%3Afalse%2C%22freedom_of_speech_not_reach_fetch_enabled%22%3Afalse%2C%22standardized_nudges_misinfo%22%3Atrue%2C%22tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled%22%3Afalse%2C%22interactive_text_enabled%22%3Atrue%2C%22responsive_web_text_conversations_enabled%22%3Afalse%2C%22longform_notetweets_rich_text_read_enabled%22%3Afalse%2C%22longform_notetweets_inline_media_enabled%22%3Afalse%2C%22responsive_web_enhance_cards_enabled%22%3Afalse%2C%22responsive_web_twitter_blue_verified_badge_is_enabled%22%3Atrue%7D failed, giving up.
CRITICAL:snscrape.base:Errors: blocked (404), blocked (404), blocked (404), blocked (404)
Traceback (most recent call last):
  File "<ipython-input-3-bd6f3035b9b7>", line 26, in get_tweets
    for tweet in tqdm(twitter_search, total=limit):
  File "/usr/local/lib/python3.10/dist-packages/tqdm/notebook.py", line 250, in __iter__
    for obj in it:
  File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/usr/local/lib/python3.10/dist-packages/snscrape/modules/twitter.py", line 1763, in get_items
    for obj in self._iter_api_data('https://twitter.com/i/api/graphql/7jT5GT59P8IFjgxwqnEdQw/SearchTimeline', _TwitterAPIType.GRAPHQL, params, paginationParams, cursor = self._cursor, instructionsPath = ['data', 'search_by_raw_query', 'search_timeline', 'timeline', 'instructions']):
  File "/usr/local/lib/python3.10/dist-packages/snscrape/modules/twitter.py", line 915, in _iter_api_data
    obj = self._get_api_data(endpoint, apiType, reqParams, instructionsPath = instructionsPath)
  File "/usr/local/lib/python3.10/dist-packages/snscrape/modules/twitter.py", line 886, in _get_api_data
    r = self._get(endpoint, params = params, headers = self._apiHeaders, responseOkCallback = functools.partial(self._check_api_response, apiType = apiType, instructionsPath = instructionsPath))
  File "/usr/local/lib/python3.10/dist-packages/snscrape/base.py", line 275, in _get
    return self._request('GET', *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/snscrape/base.py", line 271, in _request
    raise ScraperException(msg)
snscrape.base.ScraperException: 4 requests to https://twitter.com/i/api/graphql/7jT5GT59P8IFjgxwqnEdQw/SearchTimeline?variables=%7B%22rawQuery%22%3A%22ISRO%20OR%20%23SSLVD2%20OR%20%23ISRO%22%2C%22count%22%3A20%2C%22product%22%3A%22Latest%22%2C%22withDownvotePerspective%22%3Afalse%2C%22withReactionsMetadata%22%3Afalse%2C%22withReactionsPerspective%22%3Afalse%7D&features=%7B%22rweb_lists_timeline_redesign_enabled%22%3Afalse%2C%22blue_business_profile_image_shape_enabled%22%3Afalse%2C%22responsive_web_graphql_exclude_directive_enabled%22%3Atrue%2C%22verified_phone_label_enabled%22%3Afalse%2C%22creator_subscriptions_tweet_preview_api_enabled%22%3Afalse%2C%22responsive_web_graphql_timeline_navigation_enabled%22%3Atrue%2C%22responsive_web_graphql_skip_user_profile_image_extensions_enabled%22%3Afalse%2C%22tweetypie_unmention_optimization_enabled%22%3Atrue%2C%22vibe_api_enabled%22%3Atrue%2C%22responsive_web_edit_tweet_api_enabled%22%3Atrue%2C%22graphql_is_translatable_rweb_tweet_is_translatable_enabled%22%3Atrue%2C%22view_counts_everywhere_api_enabled%22%3Atrue%2C%22longform_notetweets_consumption_enabled%22%3Atrue%2C%22tweet_awards_web_tipping_enabled%22%3Afalse%2C%22freedom_of_speech_not_reach_fetch_enabled%22%3Afalse%2C%22standardized_nudges_misinfo%22%3Atrue%2C%22tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled%22%3Afalse%2C%22interactive_text_enabled%22%3Atrue%2C%22responsive_web_text_conversations_enabled%22%3Afalse%2C%22longform_notetweets_rich_text_read_enabled%22%3Afalse%2C%22longform_notetweets_inline_media_enabled%22%3Afalse%2C%22responsive_web_enhance_cards_enabled%22%3Afalse%2C%22responsive_web_twitter_blue_verified_badge_is_enabled%22%3Atrue%7D failed, giving up.
None
processing data & counting words ...
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-9-855f5bb1cd20>](https://localhost:8080/#) in <cell line: 9>()
      7 # scrape tweets and generate wordcloud (using v2)
      8 rawData = get_tweets(query, limit)
----> 9 plt_fig = generate_word_cloud_v2(rawData, topic_title, exclude_words)

1 frames
[<ipython-input-8-588ec9f19747>](https://localhost:8080/#) in generate_word_cloud_v2(rawData, topic_title, force_exclude_words, width, height, dpi)
     18     print("processing data & counting words ...")
     19     counter = Counter({})
---> 20     for tweet_content in tqdm(rawData.content, total=len(rawData.index)):
     21         refined_tweet = refine_text(tweet_content)
     22         counter = word_counter(refined_tweet, counter)

[/usr/local/lib/python3.10/dist-packages/pandas/core/generic.py](https://localhost:8080/#) in __getattr__(self, name)
   5987         ):
   5988             return self[name]
-> 5989         return object.__getattribute__(self, name)
   5990 
   5991     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'content'