konstantint / matplotlib-venn

Area-weighted venn-diagrams for Python/matplotlib
MIT License
495 stars 67 forks source link

Text labels are hard to associate with overlap regions in venn3 - add arrows to regions #50

Closed chetan201 closed 4 years ago

chetan201 commented 4 years ago

First of all, this is an amazing project. I love the simplicity of use and how shockingly effective a tool Venn diagrams are for data visualization.

One feature request I had was to solve a specific problem I am trying to address with 3 set Venn diagrams as shown in an example below. The labels are hard to associate with the overlap regions unless the context is well understood. I see that this is likely happening due to the high commonality between sets producing some of those tiny slivers of intersection and exclusivity.

I think it can be addressed by generating arrow lines from the areas to the labels. I hope to tackle this issue myself as time permits but if there were an existing easy solution, would love to hear.

Thanks!

image

konstantint commented 4 years ago

There are two aspects to your problem.

Firstly, when the data is skewed in such a way that it becomes hard to understand which region is where, the clarity and thus the benefit of the diagram quickly diminishes. You could make the visualization clearer by "regularizing" it - increasing all the regions a bit by a fixed "fake" amount. Try this:

import numpy as np
from matplotlib_venn import venn3_unweighted

# Actual sizes
sizes = (1808, 1181, 1858, 4715, 3031, 26482, 65012)

# Regularize by adding a fixed area to every region
reg_sizes = np.asarray(sizes) + sum(sizes)/20
venn3_unweighted(sizes, subset_areas=reg_sizes)

Yes, the areas on the diagram would be less "accurate", but a venn diagram is not meant to be 100% accurate anyway, it is meant to be clear. Adding a possibility for such regularization is on the roadmap for the project as part of #35 (I'd envision syntax of the form venn3(... layout=RegularizedStandard(1/20)), however as you might note I haven't yet found free time to work on this.

Secondly, you could of course keep the skewed shapes and add annotations. This looks like a rather niche design requirement for me, that could not be implemented in a sufficiently general manner to become a useful part of the library. Here's, however, something to get you started:

from itertools import product
from matplotlib import pyplot as plt
import numpy as np

def annotate_regions(venn, titles=None, ax=None, **kwargs):
  ax = ax or plt.gca()
  titles = titles or list(map(''.join, product('01','01','01')))
  kwargs = kwargs or {'bbox': dict(boxstyle='round,pad=0.5', fc='gray', alpha=0.1),
                      'arrowprops': dict(arrowstyle='->', connectionstyle='arc3,rad=0.5',color='gray')}

  xmin, xmax = ax.get_xlim()
  ymin, ymax = ax.get_ylim()

  for i in range(1,8):
    region = format(i, '#05b')[2:]
    label = v.get_label_by_id(region)
    if not label: continue
    ax.annotate(titles[i], 
                xy=label.get_position() + np.array([0, 0.05]),
                xytext=(xmax+0.1, ymax-i*0.15),
                ha='left', textcoords='data',
                **kwargs)

v = venn3((1,2,3,4,5,6,7))
annotate_regions(v)

Finally, it's obvious that the label positioning algorithm could do a better job here. At least the label 26482 should have been put further to the right on your image. This is also a known drawback of the current layout method which could be addressed in the scope of #35.

chetan201 commented 3 years ago

@konstantint thanks for taking the time to explain in detail. I appreciate your solutions there.