gpsyrou / Twitter_Sentiment_Analysis

Exploration of the Twitter API and sentiment & topic analysis on tweets relevant to COVID-19
6 stars 6 forks source link

remove_punct_and_stopwords make .lower() to be called once #31

Closed gpsyrou closed 3 years ago

lamsi7 commented 3 years ago

Hi there again! I looked in this bug. So, what we want to achieve in here is changing this part: text = ' '.join([char.lower() for char in txt_tokenized if char.lower() not in string.punctuation and char.lower() not in stopwordlist and char not in num_list]) from function _remove_punct_andstopwords. If so, I think I have an idea how to implement it. May I work on this one?

Regards, lamsi7

gpsyrou commented 3 years ago

Hi there again! I looked in this bug. So, what we want to achieve in here is changing this part: text = ' '.join([char.lower() for char in txt_tokenized if char.lower() not in string.punctuation and char.lower() not in stopwordlist and char not in num_list]) from function _remove_punct_andstopwords. If so, I think I have an idea how to implement it. May I work on this one?

Regards, lamsi7

Hello! Yes the problem with this is that I have written it in a bad way as we are calling .lower() multiple times. So we need to ensure prior to calling this function that punctuation and stopwordlist are already lowercase (which I think they already are).

Btw, I have started working on a new project: https://github.com/gpsyrou/Text_Analysis_of_Consumer_Reviews - where I am much more active these days. I am web-scraping data for Delivery companies (reviews from Trustpilot), and I am going to explore many NLP algorithms to try and find something interesting. Feel free to check it if you find anything interesting there as well

lamsi7 commented 3 years ago

@gpsyrou Great, I will quickly fix this issue and look on the new project then!