jokergoo / ComplexHeatmap

Make Complex Heatmaps
https://jokergoo.github.io/ComplexHeatmap-reference/book/
Other
1.31k stars 227 forks source link

Unknown category showing up in UpSet plots #494

Closed moldach closed 4 years ago

moldach commented 4 years ago

I'm getting confusing results from ComplexHeatmap for an UpSet plot - an extra category is showing up in the figure.

upset-shows-extra-category

I've used the SURVIVOR package to generate a binary data frame of consensus variant calls that looks like this:

t=read.table("survivor_comparison_matrix.txt")
head(t)
  Breakdancer CNVnator DeepVariant Delly GRIDSS Lumpy Manta MindTheGap NGSep
1           0        0           0     0      1     0     1          0     0
2           0        0           0     0      1     0     1          0     0
3           0        0           1     0      1     0     0          0     0
4           0        0           0     1      0     1     1          0     0
5           0        0           0     1      0     1     1          0     0
6           0        0           0     0      1     0     1          0     0
  Pindel Tardis
1      0      0
2      0      0
3      0      0
4      0      1
5      0      1
6      0      0

I've put it into a Gist here

Typically this binary data frame is then fed-into VennDiagram; however, this only works for up to 5 sets:

venn.diagram(list(Delly=which(t[,4]==1), 
  GRIDSS=which(t[,5]==1), 
  Lumpy=which(t[,6]==1),
  Manta=which(t[,7]==1),
  Pindel=which(t[,10]==1)), 
  fill = c("#DDAA33", "#BB5566", "#228833", "#004488", "#FFAABB"), 
  alpha = c(0.5, 0.5, 0.5, 0.5, 0.5), cex = 2, lty =2, 
  filename = "GRIDSS_Lumpy_Manta_Pindel_Tardis.tiff")

In the example given in the documentation there are 7 categories in a 3 set UpSet plot:

set.seed(123)
lt = list(a = sample(letters, 5),
          b = sample(letters, 10),
          c = sample(letters, 15))
lt <- list_to_matrix(lt)
m = make_comb_mat(lt)
UpSet(m)

correct

This image looks correct, for my data why is there an extra category in my figure?

To produce the image I did the following:

library(ComplexHeatmap)
library(dplyr)
t <- read.table("survivor_comparison_matrix.txt")
lt <- t %>% select(Delly:Lumpy)
m = make_comb_mat(lt)
UpSet(m)
moldach commented 4 years ago

Okay I figured out the problem. When subsetting a binary matrix there can be cases where there is 0 0 0 (I don't think this will ever happen for the full dataset - in my case). Context specific, this is areas where a variant was not recognized by any of the tools which I subset (although was found by at least one of the tools in the full matrix).

interest

I've removed them like this:

> lt <- t %>% select(Delly:Manta, Pindel:Tardis)
> head(lt)
  Delly GRIDSS Lumpy Manta Pindel Tardis
1     0      1     0     1      0      0
2     0      1     0     1      0      0
3     0      1     0     0      0      0
4     1      0     1     1      0      1
5     1      0     1     1      0      1
6     0      1     0     1      0      0
> lt <- lt[as.logical(rowSums(lt != 0)), ]