joweich / chat-miner

Parsers and visualizations for chats
MIT License
567 stars 56 forks source link

If you add custom stopwords all the stopwords get deleted (but still work, somehow) #68

Closed KianKhadempour closed 1 year ago

KianKhadempour commented 1 year ago

I was trying to add default stopwords and I decided to print out the values of the variable before and after. I noticed that before updating it, it would be a completely fine set of strings, but after updating it would be None. Here is the testing code:

from wordcloud import STOPWORDS

print("\n", STOPWORDS, "\n")

stopwords = ["sent-share", "sent-photo", "sent-video", "sent-audio"]
stopwords = STOPWORDS.update(stopwords)

print(stopwords)

The result:

{'therefore', 'was', "where's", 'through', "he'll", 'up', 'him', "shan't", 'just', 'if', 'her', "let's", 'why', 'with', 'would', 'http', 'are', 'until', 'doing', 'about', "it's", 'most', 'own', 'under', "we'll", "hadn't", 'is', "isn't", 'r', 'however', 'before', "aren't", "i'm", 'more', "couldn't", 'on', 'it', 'too', 'you', "you'll", 'having', "who's", "she'd", "he'd", "shouldn't", 'a', 'com', "can't", 'yours', 'ours', 'to', "weren't", 'k', 'as', 'no', "they'll", 'same', "he's", 'such', 'after', 'themselves', "you'd", "you've", 'theirs', 'were', 'then', 'his', 'my', "you're", "wasn't", 'further', "she'll", 'yourselves', 'ourselves', 'she', 'am', 'by', 'for', 'how', 'during', 'like', 'there', "how's", 'myself', 'any', "won't", "she's", 'or', 'so', 'at', 'nor', 'have', 'than', 'herself', 'cannot', "haven't", 'that', "they're", 'all', 'also', 'between', "don't", "we're", 'being', 'get', 'while', "there's", 'should', 'because', 'else', 'of', 'has', 'below', 'not', 'been', 'do', 'what', "i'll", 'again', 'can', 'ever', 'other', 'them', 'does', 'could', 'each', "they've", 'which', 'off', 'both', 'but', "doesn't", "i've", "they'd", 'those', 'few', 'who', 'hers', 'did', "mustn't", 'this', 'www', 'from', 'be', 'ought', "we'd", 'here', 'he', 'itself', 'when', 'an', 'and', 'its', 'yourself', 'they', 'into', 'shall', 'out', 'hence', 'since', 'only', 'whom', "here's", "that's", "we've", "i'd", 'their', "what's", 'the', 'himself', "why's", 'very', "when's", 'down', "wouldn't", 'these', "didn't", 'above', 'me', 'otherwise', 'some', 'i', 'our', 'over', 'had', 'we', 'your', "hasn't", 'against', 'where', 'once', 'in'}

None

I think the problem is that stopwords is being assigned the return value of the .update method, which is None. Somehow, though, when I test the code everything works fine. The only problem is adding default stopwords such as in #65. Here is a suggested change:

From

if stopwords:
        stopwords = STOPWORDS.update(stopwords)

To

if stopwords:
        STOPWORDS.update(stopwords)

STOPWORDS.update(["default", "stopwords", "here"])
stopwords = STOPWORDS

I can add this change to #65 if you want.

joweich commented 1 year ago

Oh yes, thank you for raising this!

I think the problem is that stopwords is being assigned the return value of the .update method, which is None.

This is probably exactly the reason, since update updates the values in place.

I can add this change to https://github.com/joweich/chat-miner/pull/65 if you want.

I think we should add default stopwords in another PR because this will need some thoughts on how to implement it cleanly as different parsers would require different default stopwords. Also, I would love to introduce an option for automatically pulling standard stopword list (like this) at some point.

KianKhadempour commented 1 year ago

This issue is fixed in #65.