paulbrodersen / matplotlib_venn_wordcloud

Venn diagrams with word clouds
MIT License
49 stars 12 forks source link

Handling of border cases #4

Closed vpodpecan closed 5 years ago

vpodpecan commented 5 years ago

Currently, if the input data is degenerated, i.e., two sets are empty or all intersections are empty etc., the diagram gets messed up. It would really be good to have handling of such cases because they appear quite often in real life. Should this be fixed in matplotlib_venn_wordcloud or in the parent matplotlib-venn?

paulbrodersen commented 5 years ago

That is a really good point, thanks for raising the issue. I have the philosophy that you should try to catch errors (or sources of errors) as early as possible. So independent of what Konstantin does with matplotlib-venn, I would want to catch the degenerate case in my code. I have a bunch of meetings today, and will be AFK this weekend, so I will probably not get to it until next week. However, if you want to have a go at it, then by all means, do so. I would love a pull-request. ;-)

paulbrodersen commented 5 years ago

Sorry that took longer than expected. There were two problems that both took effect in the exact same line such that for a long time all my fixes did not have the expected effect.

Problem 1): matplotlib.patches.Circle objects don't have a proper path, instead they all have the unit circle path, and you have to apply the transform that comes with the Circle patch to get the correct path. Until now, I didn't have to worry about that because matplotlib-venn never returned Circle patches, as each patch always corresponded to less than a full circle, and the partial circle patches returned by matplotlib-venn had all the correct path.

The solution is simple and is outlined here.

Problem 2: matplotlib-venn, however, does not return a normal Circle object (even though isinstance(patch, matplotlib.patches.Circle) evalutes to True. Hence applying the solution above still returns the wrong coordinates.

Solution: using the center and radius of the matplotlib-venn circle, create a dummy circle when computing the image masks for wordcloud.

Both parts of the solution are implemented in c9617fe.

paulbrodersen commented 5 years ago

Note: in this patch, I am only handling the degenerate case, where all sets are non-empty but some or all of the subsets are empty. If any of the sets don't exist / are empty, I throw an error, as that IMO violates the (implicit) contract of my functions, and should be caught before my functions are ever called.