Closed Franck-Dernoncourt closed 7 years ago
Due to a particular skew in your data (the size of the set AB~C is too small relatively to other sizes) the current positioning algorithm decides to favor other intersections to such an extent that the set AB~C is simply not present at all on the diagram.
According to the current logic the labels for the intersections are positioned within the corresponding subsets, and when there is no subset there is nowhere to put the label, and hence you do not see it.
This is obviously a "feature" that is a problem, yet I am not sure how to immediately address it (suggestions welcome).
In your current situation, assuming you just need to generate this particular diagram, you may hack the system by making it believe that the AB~C set is larger than it is in the data, this will position the circles as needed, and then you may change the label of the corresponding set back to its true value, e.g.:
# Note I am adding 500 additional unique entries both to A and B but not C
v = venn3([set(sids_3a) | set(range(100000, 100500)), set(sids_3b) | set(range(100000, 100500)), set(sids_3c)], ('Diabetes (ICD9)', 'Hemo (ICD9+proc)', 'Diabetes or Hemo (notes)'))
lbl = v.get_label_by_id('110')
lbl.set_text(int(lbl.get_text()) - 500)
In general, I will first of all at least add a warning to the situations where the non-zero regions are hidden. Later I may consider also adding a "safe mode" flag, which will automatically resolve such situations in some ad-hoc way. Thus keeping this issue open.
Sounds good, thanks!
Perhaps one solution would be to add an option to base the size of circle on log of the cardinality? (or more generally accept as parameter a user-specified function that take cares of re-scaling circles' size, log being the most common one)
Currently you can achieve this using venn3_unweighted
as follows:
venn3_unweighted(cvs, subset_areas=map(log, cvs))
where cvs
denotes the sizes of all 7 subsets. If you start with the sets themselves you can compute those as:
from matplotlib_venn._venn3 import compute_venn3_subsets
cvs = compute_venn3_subsets(a, b, c)
I am not immediately convinced that a version of venn3_unweighted
with a "size-mapping" given as a function is necessarily useful. Looking at happens to your diagram when you put a log
there I must say this is certainly not a good way of representing your data because log kills most of the important differences between actual set sizes - you could draw a completely unweighted diagram just as well.
Moreover, I do not imagine any good reason for rescaling your set sizes on the diagram (and thus lying about your data), apart from the need to build around the ugly features of the current system like the one here.
So I think I'd rather consider fixing the root of the problem rather than add easy ways to lie about your data. I will include the example with map(log, cvs)
in the README, though.
Thanks for the thorough answer!
As a side note, you're probably aware, just in case: Chen, Hanbo, and Paul C. Boutros. "VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R." BMC bioinformatics 12.1 (2011): 35.:
During development of the VennDiagram package, it was discovered that it was impossible to draw accurate, scaled Venn diagrams with three sets using circles.
Yes, and this aspect is also noted in the README to the matplotlib_venn package.
The reason is simple: there are 7 subsets that need to be depicted on the diagram. If you assume their area sums to 1 this makes 6 different free parameters that need visualizing. However, the diagram only has 5 degrees of freedom: if you fix the centers of two circles at (0, 0) and (0, 1) you are left with x, y for the third circle and r1, r2, r3 for the three radii.
Duplicate of #30
I have 3 sets. I did one venn2 and one venn3:
Venn 2:
Venn 3:
If you look at the circle representing the set
Hemo (ICD9+proc)
, in Venn 2 it contains1032+1059=2091
items, while invenn3
it contains1013+862+197=2072
items.Did I miss something or is the count wrong in venn3? (the set
Hemo (ICD9+proc)
contains 2091 items)