marmarbar003 / TxtMin-Project-SA

0 stars 0 forks source link

Text Mining Project - Sentiment Analysis

Group Members: Maria, Giami, and Hannah

Romeo vs. Juliet: Investigating Gender-Based Linguistic Variations in Online Reviews: A Sentiment Analysis Approach

Abstract

This project aims to investigate how language and sentiment vary between online reviews written by men and women. Understanding the role of gender in review language is essential for businesses to tailor their marketing strategies effectively and efficiently. We plan to use a dataset of online reviews sourced from various platforms, and look at how differently language is used between genders. By using processing techniques, we will extract sentiments and linguistic features to examine differences between male and female-authored reviews. We also plan to use libraries that are available for gender identification to determine the gender of the user(name). Analyzing patterns will help reveal insights into gender-specific linguistic tendencies, and give deeper insights into underlying preferences driving review composition. Ultimately, this research aims to contribute to a deeper understanding of gender dynamics in online discourse and offer actionable implications for businesses looking to engage with diverse consumer segments.

Research Questions

Datasets

Contain name/profile:

Corpuses:

Process

The dataset as one can see are from different phones however for the sake of the analysis we will consider them the same since they are made from the same brand. First off we would have to filter out any irrelevant characters such as numbers or uncommon punctuation like : and ;. Filtering these irrelevant punctuation will hopefully allow the more significant punctuation such as ! for sentimental analysis. One may consider making the review all the same case however capitalizing an entire word like AMAZING! creates more of a positive sentiment than Amazing! so we will keep fully capitalized words (if any). Some people rather than using words may use emojis to showcase their opinion/emotion. This means we would have to figure out a way to identify an emoji and evaluate whether it would be perceived as good or bad. A possible idea would be to identify the emoji with its unicode. Since all the data we have been able to collect is in english there is no need to translate or do language processing. In terms of enriching the data we can also investigate what aspects of the phone do reviewers tend to focus on like whether it is the camera quality or the size of the screen and see if there is any difference between the genders. One of the research questions concerns the length of the review of each gender so we would find the length of the review before preprocessing it.

Planned Milestones

Questions of Each Update

Update 0:

Documentation