mratanusarkar / twitter-sentiment-analysis

a demo poc for sentiment analysis of tweets
0 stars 0 forks source link

twitter-sentiment-analysis

This is a demo poc for sentiment analysis of tweets. The repo is divided into:

Where you can find:

In the Notebooks, and a script/module format of the same in Runner folder for background running jobs.

Features

This repo is still a work-in-progress.
Some of the features currently implemented are as follows:


Tweet Scraper

This is a scrapper function used to gather and collect tweets, powered by snscrape.
compared to tweet API v2, this enables us to get unlimited tweets without any restrictions and without the need to get API tokens and secrets.

usage:

Here is a sample usage:

from module.scraper import TweetScraper

# create helper objects
tweet_scraper = TweetScraper()

# set parameters
query = '@isro'
limit = 1000

# scrape tweets
rawData = tweet_scraper.get_tweets(query, limit)

This will return a pandas dataframe containing last 1000 tweets from @isro.
see the function signature below to get more details on function parameters.

parameters

Parameter Data Type Description More Details
query string twitter search query as per https://twitter.com/search?q= it can be a user mention like: @user or hashtag like #tag or a word like text or a complex query joined by AND, OR, or statement enclosed in (). Explore twitter.com/search-advanced to know more.
limit int number of tweets you want to scrape depending on number of tweets, the script will take time to execute. example: 100 tweets will be collected in 1s, where as 10,000 might take 5min and 1,00,000 may take 1h.
return pandas dataframe a pandas dataframe with the tweets as of now, the following data fields are collected: id, date, username, content, view_count, like_count, reply_count, retweet_count, quote_Count, url

Twitter Word Cloud

This is a visualization tool powered by word_cloud. Combined with the scraper function above, this tool gives you the capability to visualize what's going on in twitter at a glance! In short, it uses all the tweets and counts the most occurring words in the tweets. It discards the common english words, and non-english characters, does pre-processing and data cleaning, and In the end, you get a word cloud that gives insight into your search query.

For example:

I am sharing a few use-cases below.

sample use case:

Here is a sample word cloud generated using limit: 10,000 and query: ISRO (#SSLVD2 OR #ISRO) at resolution: width, height: 1080, 720 during the SSLV-D2 Launch on 10th Feb, 2023. You can clearly see how Twitter was looking that day during the Launch, in just one snapshot!

ISRO SLVD2 Launch

usage:

Here is a sample usage:

from module.scraper import TweetScraper
from module.generator import TwitterWordCloud

# create helper objects
tweet_scraper = TweetScraper()
tweet_wc = TwitterWordCloud()

# set parameters
topic_title = 'ISRO During SSLV-D2 Launch'
query = 'ISRO (#SSLVD2 OR #ISRO)'
limit = 1000
exclude_words = ['amp', 'eval']

# scrape tweets
rawData = tweet_scraper.get_tweets(query, limit)
tweet_wc.generate_word_cloud_v2(rawData, topic_title, exclude_words, 1080, 720)

This will generate a wordcloud using last 1000 tweets made during the ISRO SSLV-D2 Launch. see the function signature below to get more details on function parameters.

parameters

function: generate_word_cloud():

a simple generator function with with only one required parameter (the dataframe) for quick easy word cloud generation.
The output image is (1000px, 500px) in a (15, 8) inch canvas.

Parameter Data Type Description More Details
rawData pandas dataframe pandas dataframe from scraper function
force_exclude_words list of strings words you wish to exclude from word cloud after seeing an output, if you feel some words from the image that you wish to exclude, you can do so using this option
return None it generates and display the wordcloud, and saves as png

generate_word_cloud_v2():

a move customizable and generic function with the following parameters

Parameter Required Data Type Description More Details
rawData Yes pandas dataframe pandas dataframe from scraper function
topic_title Yes string a short string describing the topic of tweets in the dataframe the output files will have the same name as the topic
force_exclude_words No, default [] list of strings words you wish to exclude from word cloud after seeing an output, if you feel some words from the image that you wish to exclude, you can do so using this option
width No, default 1000 int number of pixels wide of the output image
height No, default 500 int number of pixels height of the output image
dpi No, default 100 int pixel density per inch
return NA None it generates and display the wordcloud, and saves as png