Closed gpsyrou closed 3 years ago
Hi there again! I looked in this bug. So, what we want to achieve in here is changing this part:
text = ' '.join([char.lower() for char in txt_tokenized if char.lower() not in string.punctuation and char.lower() not in stopwordlist and char not in num_list])
from function _remove_punct_andstopwords. If so, I think I have an idea how to implement it. May I work on this one?Regards, lamsi7
Hello! Yes the problem with this is that I have written it in a bad way as we are calling .lower() multiple times. So we need to ensure prior to calling this function that punctuation and stopwordlist are already lowercase (which I think they already are).
Btw, I have started working on a new project: https://github.com/gpsyrou/Text_Analysis_of_Consumer_Reviews - where I am much more active these days. I am web-scraping data for Delivery companies (reviews from Trustpilot), and I am going to explore many NLP algorithms to try and find something interesting. Feel free to check it if you find anything interesting there as well
@gpsyrou Great, I will quickly fix this issue and look on the new project then!
Hi there again! I looked in this bug. So, what we want to achieve in here is changing this part:
text = ' '.join([char.lower() for char in txt_tokenized if char.lower() not in string.punctuation and char.lower() not in stopwordlist and char not in num_list])
from function _remove_punct_andstopwords. If so, I think I have an idea how to implement it. May I work on this one?Regards, lamsi7