saidhanush4422 / Helium-3

0 stars 1 forks source link

Module 3: Create a Mood Prediction system #5

Closed saidhanush4422 closed 1 year ago

saidhanush4422 commented 1 year ago

During this task we should create a sentiment analysis system based on text and emojis, so that we can predict the final emotion of the user and recommend a solution for their emotional stability based on their mood

Content 1

from textblob import TextBlob
import ipywidgets as widgets
from tkinter import *
from tkinter import filedialog
import numpy as np
import pandas as pd

# Define function to analyze emotion
def analyze_emotion(text):
    blob = TextBlob(text)
    sentiment_score = blob.sentiment.polarity
    if sentiment_score > 0.2:
        return 'Happy'
    elif sentiment_score >= 0 and sentiment_score <= 0.2:
        return 'Neutral'
    elif sentiment_score < 0 and sentiment_score >= -0.2:
        return 'Depressed'
    elif sentiment_score < -0.2 and sentiment_score >= -0.5:
        return 'Sad'
    else:
        return 'Angry'

This code uses the TextBlob library to perform sentiment analysis on the given text. The analyze_emotion function returns the emotion of the text as either Happy, Neutral, Depressed, Sad, or Angry based on the sentiment score of the text.

The threshold values used to classify the sentiment score into different emotions are arbitrary and can be changed according to your needs.

Note: This code only works based on the sentiment score of the text and does not take into account any emojis or other factors that may influence the user's emotion.

Failed Test Case

# Import necessary libraries
from textblob import TextBlob
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
import pandas as pd
import emoji

# Load dataset
mydata = pd.read_csv('D:\Major project\Sentiment analysis\data\\tes.csv')

data = emoji.demojize(mydata)
# Split dataset into train and test sets
train_data, test_data = train_test_split(data, test_size=0.2, random_state=42)

train = pd.read_csv('D:\Major project\Sentiment analysis\data\\train_emoji.csv')
text = emoji.demojize(train)
# Define function to analyze emotion
def analyze_emotion(text):
    blob = TextBlob(text)
    sentiment_score = blob.sentiment.polarity
    if sentiment_score > 0.2:
        return 'Happy'
    elif sentiment_score >= 0 and sentiment_score <= 0.2:
        return 'Neutral'
    elif sentiment_score < 0 and sentiment_score >= -0.2:
        return 'Depressed'
    elif sentiment_score < -0.2 and sentiment_score >= -0.5:
        return 'Sad'
    else:
        return 'Angry'

# Apply analyze_emotion function to train data
train_data['predicted_emotion'] = train_data['text'].apply(analyze_emotion)

# Calculate accuracy score and confusion matrix for train data
train_accuracy = accuracy_score(train_data['emotion'], train_data['predicted_emotion'])
train_confusion_matrix = confusion_matrix(train_data['emotion'], train_data['predicted_emotion'])

print('Train accuracy:', train_accuracy)
print('Train confusion matrix:', train_confusion_matrix)

# Apply analyze_emotion function to test data
test_data['predicted_emotion'] = test_data['text'].apply(analyze_emotion)

# Calculate accuracy score and confusion matrix for test data
test_accuracy = accuracy_score(test_data['emotion'], test_data['predicted_emotion'])
test_confusion_matrix = confusion_matrix(test_data['emotion'], test_data['predicted_emotion'])

print('Test accuracy:', test_accuracy)
print('Test confusion matrix:', test_confusion_matrix)

The sentiment analysis algorithm used in the code is based on the TextBlob library, which uses a lexicon-based approach.

The TextBlob library provides a pre-trained sentiment lexicon, which is a collection of words and their associated sentiment scores. The sentiment score of a text is calculated as the sum of the sentiment scores of its individual words, normalized by the length of the text. The sentiment score ranges from -1 (most negative) to 1 (most positive), with 0 being neutral.

The analyze_emotion function in the code uses the sentiment score of the text to classify it into one of the five emotions: Happy, Neutral, Depressed, Sad, or Angry. The threshold values used to classify the sentiment score into different emotions are arbitrary and can be changed according to your needs.

Note that this lexicon-based approach has limitations and may not always accurately capture the nuances of human emotions. More sophisticated models, such as machine learning algorithms and neural networks, may be needed to improve the performance of sentiment analysis in certain cases.

how do I extract output from the function from each and every line

def process_line(line):
    # Process the line here
    return processed_line

with open('input_file.txt', 'r') as file:
    for line in file:
        processed_line = process_line(line)
        print(processed_line)

In this example, the process_line() function is called on each line of the input file, and the processed output is printed to the console using the print() function. You can replace the print() function with any other function that you want to use to process the output.

To store the processed output for each line in a list, you can initialize an empty list before the loop, and then append the processed output to the list inside the loop:

def process_line(line):
    # Process the line here
    return processed_line

output_list = []

with open('input_file.txt', 'r') as file:
    for line in file:
        processed_line = process_line(line)
        output_list.append(processed_line)

print(output_list)

In this example, the processed_line variable is appended to the output_list for each line in the input file. The final output_list is printed to the console using the print() function.

How do I recognize the maximum number of repeated element in a list

To recognize the maximum number of repeated element in a list, you can use a dictionary to count the number of occurrences of each element in the list, and then find the maximum count using the max() function.

my_list = [1, 2, 3, 4, 2, 3, 2, 2, 4, 4, 4, 4]

count_dict = {}
for element in my_list:
    if element in count_dict:
        count_dict[element] += 1
    else:
        count_dict[element] = 1

max_count = max(count_dict.values())

for element, count in count_dict.items():
    if count == max_count:
        print(f"{element} occurs {count} times (the maximum number of times)")

In this example, the my_list variable contains the list of elements. A dictionary count_dict is initialized to store the count of each element in the list. The for loop iterates over each element in the list, checks if the element is already in the dictionary, and increments the count if it is, or adds a new key-value pair with a count of 1 if it isn't. The max_count variable is set to the maximum value of the counts in the count_dict dictionary using the max() function. Finally, another for loop iterates over the items in the count_dict dictionary and prints the element and count if the count is equal to the max_count.

Note that this code assumes that there is only one element that occurs the maximum number of times in the list. If there are multiple elements that occur the maximum number of times, the code will only print one of them.

what is the searching technique used in the above code

The searching technique used in the above code is a simple linear search. The for loop iterates over each element in the list, and the dictionary count_dict is checked to see if the element is already a key in the dictionary. If the element is in the dictionary, the count for that element is incremented. If the element is not in the dictionary, a new key-value pair is added to the dictionary with a count of 1.

This approach is simple and easy to implement, but it has a time complexity of O(n^2) in the worst case, where n is the number of elements in the list. This is because the in operator used to check if the element is in the dictionary has a time complexity of O(n) in the worst case, where n is the number of keys in the dictionary.

If the list is very large, or if the number of unique elements in the list is much smaller than the total number of elements, a more efficient algorithm such as a hash table or a binary search tree could be used to improve performance.

saidhanush4422 commented 1 year ago

Final Code

# Import necessary libraries
from textblob import TextBlob
import ipywidgets as widgets
from tkinter import *
from tkinter import filedialog
import numpy as np
import pandas as pd

# Define function to analyze emotion
def analyze_emotion(text):
    blob = TextBlob(text)
    sentiment_score = blob.sentiment.polarity
    if sentiment_score > 0.2:
        return 'Happy'
    elif sentiment_score >= 0 and sentiment_score <= 0.2:
        return 'Neutral'
    elif sentiment_score < 0 and sentiment_score >= -0.2:
        return 'Depressed'
    elif sentiment_score < -0.2 and sentiment_score >= -0.5:
        return 'Sad'
    else:
        return 'Angry'

# Example usage
user_input = input("Enter '1' to type input, '2' to upload a text file: ")

if user_input == '1':
    # Prompt user to type input in the command prompt
    print("Enter the text and type in quit to exit the prompt")
    output_list = []
    #user_text = input("Enter text: ")
    while True:
        user_text = input("Enter text: \n")
        emotion = analyze_emotion(user_text)
        print(emotion)
        output_list.append(emotion)
        #print(user_text)
        if user_text.lower()=="quit":
            print("You have exited the prompt")
            break
    print(output_list)

    #to find out the repeated emotion using simple linear search

    count_dict = {}
    for element in output_list:
        if element in count_dict:
            count_dict[element] += 1
        else:
            count_dict[element] = 1

    max_count = max(count_dict.values())

    for element, count in count_dict.items():
        if count == max_count:
            print(f"{element} occurs {count} times (the maximum number of times)")
            final_emotion=element
            print(final_emotion)

    #final_emotion = max()
    #print(final_emotion)

elif user_input == '2':
    # Prompt user to upload a text file
    ele= int(input("Enter the no.of elements"))

    # Function to process each line of data
    def process_line(line, n):
        # Delete the first n elements from the line
        modified_line = line.split()[n:]
        # Convert the modified line back to a string and return it
        return ' '.join(modified_line)

    def delete_n_elements(n):
        filename = filedialog.askopenfilename(initialdir="/", title="Select a File", filetypes=(("Text files", "*.txt"), ("All Files", "*.*")))
        with open(filename, "r", encoding='utf-8') as f:
            modified_lines = [process_line(line.strip(), n) for line in f]
        return modified_lines

    # Example usage:
    #def process_emotion(emotion):
    # Process the line here
    # return processed_emotion

    output_list = []

    modified_data = delete_n_elements(ele)
    for line in modified_data:
        print(line)
        emotion = analyze_emotion(line)
        print(emotion)
        output_list.append(emotion)

#to print the list of all the outputs
    #print(output_list)

    #to find out the repeated emotion
    count_dict = {}
    for element in output_list:
        if element in count_dict:
            count_dict[element] += 1
        else:
            count_dict[element] = 1

    max_count = max(count_dict.values())

    for element, count in count_dict.items():
        if count == max_count:
            print(f"{element} occurs {count} times (the maximum number of times)")
            final_emotion=element
            print(final_emotion)

#    final_emotion = max(output_list)
#    print(final_emotion)

    #file_path = input("Enter file path: ")
    #with open(file_path, 'r') as file:
    #    user_text = file.read()
else:
    print("Invalid input. Please enter '1' or '2'.")

# Do something with the user_text variable

# Take final_emotion as input and use that as the recommendation parameter