paulbrodersen / matplotlib_venn_wordcloud

Venn diagrams with word clouds
MIT License
50 stars 12 forks source link

composite words not recognized #2

Closed rugantio closed 6 years ago

rugantio commented 6 years ago

Hello, I am trying to use your lib to make differential analysis for a data science project. I had to make some changes to your code because sometimes sets are empty and for the visualization, venn3_unweighted comes handy (you can check out the fork¹ if you like). One problem I couldn't solve is that composite words (that wordcloud.WordCloud handles well) are not properly recognized, thus messing the visualization. As you see here, the article "di" is repeated in different sets: https://raw.githubusercontent.com/rugantio/Venn/master/images/AndreaSantiagoAntonietta_words.png I thought that the problem was at line 402 where the main.py says if not word_to_frequency: text = " ".join(words) wc.generate(text)
I tried to fix it but couldn't do it, can you give a look? Thank you for your work!

¹ The code I'm using https://github.com/rugantio/matplotlib_venn_wordcloud/blob/master/matplotlib_venn_wordcloud/_main.py

paulbrodersen commented 6 years ago

Can you produce a self-contained, minimal, working example including the script that is calling my functions and data that are you passing in? That would make it a lot easier debug.

rugantio commented 6 years ago

Sounds right, here is the minimal script:

from matplotlib import pyplot as plt
from matplotlib_venn_wordcloud import venn2_wordcloud

x = {'sincerely','department', 'usa', 'usa nation'}
y = {'sincerely','security','usa democracy'}

s = (x,y)

v = venn2_wordcloud(s)

plt.show()

wordcloud As you see the words "usa nation" and "usa democracy" are split and "usa nation" seems to overwrite the simpler "usa".

paulbrodersen commented 6 years ago

Should be fixed. Problem was where you said it might be. Thanks for raising the issue.

figure_4