This project conducts a comprehensive analysis of Twitter data, encompassing data preprocessing, sentiment analysis, user categorization and machine learning modeling. It begins by cleaning and preparing the data, followed by sentiment analysis to classify tweets as positive, neutral, or negative. Users are categorized into specific groups based on their tweet content, and word frequency analysis reveals the most common words for each user category. The project also compares word frequencies between positive and negative tweets for specific keywords. Finally, machine learning models (Bernoulli Naive Bayes, Linear Support Vector Classifier, and Logistic Regression) are employed to classify sentiment, and their performance is evaluated. This project provides valuable insights into user behavior, sentiment trends, and word usage on Twitter.
This project conducts a comprehensive analysis of Twitter data, encompassing data preprocessing, sentiment analysis, user categorization and machine learning modeling. It begins by cleaning and preparing the data, followed by sentiment analysis to classify tweets as positive, neutral, or negative. Users are categorized into specific groups based on their tweet content, and word frequency analysis reveals the most common words for each user category. The project also compares word frequencies between positive and negative tweets for specific keywords. Finally, machine learning models (Bernoulli Naive Bayes, Linear Support Vector Classifier, and Logistic Regression) are employed to classify sentiment, and their performance is evaluated. This project provides valuable insights into user behavior, sentiment trends, and word usage on Twitter.