hms-dbmi / UpSetR

An R implementation of the UpSet set visualization technique published by Lex, Gehlenborg, et al..
https://cran.rstudio.com/web/packages/UpSetR
Other
756 stars 153 forks source link

Set order not maintained when intersection list provided #169

Open shaman-narayanasamy opened 5 years ago

shaman-narayanasamy commented 5 years ago

Dear authors,

I am having issues with the ordering of the upset plot. Here, I provide a reproducible example.

Read the usual movies data set.

movies <- read.csv(system.file("extdata", "movies.csv", package = "UpSetR"), 
    header = T, sep = ";")

Generate upset plot with certain ordering (say, alphabetical order) of the sets and use keep.order = T to retain the specified ordering:

movies %>% upset(., keep.order = T, 
                 sets = c("Action", "Comedy", "Drama", "Romance", "Thriller")
                 )

image

As you can see, everything works swimmingly. Now, I attempt to display only certain intersections that I am specifically interested in.

movies %>% upset(., keep.order = T, 
                 sets = c("Action", "Comedy", "Drama", "Romance", "Thriller"), 
                 intersections = list( 
                   list("Action", "Comedy", "Romance"), 
                   list("Action", "Drama", "Thriller"), 
                   list("Action", "Drama", "Romance"), 
                   list("Action", "Thriller"), 
                   list("Drama", "Thriller")  
                   )                
                 )

image

However, the specified set ordering is not retained despite keep.order = T executed.

Perhaps, I am making a mistake with the command and parameters. I was also unable to find any issues that resemble this. Perhaps, I missed them. Could you please let me know how I could fix this?

Looking forward to your response.

Cheers, Shaman

adomingues commented 4 years ago

Just to mention that I am also running into this issue.

Wenjun-Liu commented 3 years ago

I have also run into the same issue. Hope to know if anyone has been able to solve it.

kikegoni commented 3 years ago

Same here, any updates?

ahepperla commented 3 years ago

Writing here that I also have this issue

vmkalbskopf commented 2 years ago

I also have this issue.

dariober commented 2 years ago

I need this too - I came up with the following temporary hack. I re-defined and reassigned to namespace the function specific_intersections as below in order to avoid re-reordering the intersections if order.by = NULL.

Test on OP's example:

upset(movies, keep.order = T, 
                 order.by= NULL,
                 sets = c("Action", "Comedy", "Drama", "Romance", "Thriller"), 
                 intersections = list( 
                   list("Action", "Comedy", "Romance"), 
                   list("Action", "Drama", "Thriller"), 
                   list("Action", "Drama", "Romance"), 
                   list("Action", "Thriller"), 
                   list("Drama", "Thriller")  
                   )                
                 )

image

specific_intersections <- function(data, first.col, last.col, intersections, order_mat,
                                   aggregate, decrease, cut, mbar_color, set_names){
  data <- as.data.frame(data)
  sets <- names(data[c(first.col:last.col)])
  keep <- unique(unlist(intersections))
  remove <- sets[which(!sets %in% keep)]
  remove <- which(names(data) %in% remove)
  if(length(remove) != 0){
    data <- data[-remove]
  }

  data <- plyr::count(data[keep])
  sets <- names(data[1:length(keep)])
  data <- lapply(intersections, function(x){
    temp_sets <- unlist(x)
    x <- data[which(rowSums(data[1:length(keep)]) == length(temp_sets)), ]
    x <- x[which(rowSums(x[temp_sets]) == length(temp_sets)), ]
    if(nrow(x) == 0){
      names <- names(x[1:length(keep)])
      x <- rbind(x, rep(0, ncol(x)))
      colnames(x) <- c(names, "freq")
      x[ ,which(names %in% temp_sets)] <- 1
    }
    x <- x
  })

  Freqs <- data.frame()

  for(i in seq(length(data))){
    Freqs <- rbind(Freqs, data[[i]])
  }

  Freqs <- Freqs[c(set_names, "freq")]

  num_sets <- length(keep)

  if(tolower(aggregate) == "degree" | is.null(order_mat) == TRUE){
    for(i in 1:nrow(Freqs)){
      Freqs$degree[i] <- rowSums(Freqs[ i ,1:num_sets])
    }
    if(is.null(order_mat) == FALSE) { 
        order_cols <- c()
        for(i in 1:length(order_mat)){
          order_cols[i] <- match(order_mat[i], colnames(Freqs))
        }

        for(i in 1:length(order_cols)){
          logic <- decrease[i]
          Freqs <- Freqs[order(Freqs[ , order_cols[i]], decreasing = logic), ]
        }
    }
  } else if(tolower(aggregate) == "sets" & is.null(order_mat) == FALSE) {
    Freqs <- Get_aggregates(Freqs, num_sets, order_mat, cut)
  } else {
    stop('Not implemented yet')
  }
  #delete rows used to order data correctly. Not needed to set up bars.
  delete_row <- (num_sets + 2)
  Freqs <- Freqs[ , -delete_row]
  for( i in 1:nrow(Freqs)){
    Freqs$x[i] <- i
    Freqs$color <- mbar_color
  }
  Freqs <- na.omit(Freqs)
  return(Freqs)
}

assignInNamespace('specific_intersections', specific_intersections, ns= 'UpSetR')
stuartmac commented 2 years ago

I had this issue too. @dariober's solution worked well for me, thanks!

LaureTomas commented 2 years ago

I have the same issue too, and checking @dariober solution I saw that in the code you tried to order the sets, but again in the plot the sets are ordered in other way, while the intersections are correctly ordered... so the problem remains...

mictadlo commented 1 year ago

Hi, where do I add specific_intersections <- function(data, first.col, last.col, intersections, order_mat, aggregate, decrease, cut, mbar_color, set_names){..}?

dariober commented 1 year ago

@mictadlo Just execute the code in your interactive session or add it to your script like any other function. Take care to execute also the line assignInNamespace('specific_intersections', specific_intersections, ns= 'UpSetR'). Perhaps best is to put all that (function definition and assignInNamespace) in a file and then do source("specific_intersections.R") at the start of your session.

mictadlo commented 1 year ago

Thank you, but unfortunately, it does not work with my data. What did I do wrong?

> library("UpSetR")
> orthogroups_df<- read.table("orthogroups.GeneCount.tsv",  header=T, stringsAsFactors = F)
> #All species
> selected_species <- colnames(orthogroups_df)[2:(ncol(orthogroups_df) -1)] 
> selected_species
 [1] "Atha" "Cann" "NQLD" "Natt" "Ngla" "Nlab" "Nsyl" "Ntab" "Ntom" "Slyc" "Stub" "Vvin"
> head(orthogroups_df)
  Orthogroup Atha Cann NQLD Natt Ngla Nlab Nsyl Ntab Ntom Slyc Stub Vvin Total
1  OG0000000    0    0  965    0    0    3    0    0    0    0    0    0   968
2  OG0000001    0    1    3    0    0  448    0    0    0    0    0    0   452
3  OG0000002    0    1  313    0    0  120    1    0    1    0    0    0   436
4  OG0000003    0   93   15   21   46   16   33   63   36   25   39   26   413
5  OG0000004    1   42    2   34  109    6    8  154   11    9    4    0   380
6  OG0000005    0    2   61    1   34   44   91   70   43   20    1    0   367
> ncol(orthogroups_df)
[1] 14
> orthogroups_df[orthogroups_df > 0] <- 1
> # we only show intersections of interest ,  
> intersections=list(list(selected_species),
+                    list("NQLD", "Ngla", "Natt", "Nlab", "Nsyl", "Ntab", "Ntom"),
+                    list("Stub", "Slyc"),
+                    list("Atha", "Vvin"),
+                    list("Ntab", "Nsyl", "Ntom"),
+                    list("Nlab", "NQLD", "Ngla"), 
+                    list("Nlab", "NQLD", "Nsyl"), 
+                    list("Nlab", "Ngla", "Nsyl"), 
+                    list("NQLD", "Nsyl", "Ngla"))
> specific_intersections <- function(data, first.col, last.col, intersections, order_mat,
+                                    aggregate, decrease, cut, mbar_color, set_names){
+   data <- as.data.frame(data)
+   sets <- names(data[c(first.col:last.col)])
+   keep <- unique(unlist(intersections))
+   remove <- sets[which(!sets %in% keep)]
+   remove <- which(names(data) %in% remove)
+   if(length(remove) != 0){
+     data <- data[-remove]
+   }
+   
+   data <- plyr::count(data[keep])
+   sets <- names(data[1:length(keep)])
+   data <- lapply(intersections, function(x){
+     temp_sets <- unlist(x)
+     x <- data[which(rowSums(data[1:length(keep)]) == length(temp_sets)), ]
+     x <- x[which(rowSums(x[temp_sets]) == length(temp_sets)), ]
+     if(nrow(x) == 0){
+       names <- names(x[1:length(keep)])
+       x <- rbind(x, rep(0, ncol(x)))
+       colnames(x) <- c(names, "freq")
+       x[ ,which(names %in% temp_sets)] <- 1
+     }
+     x <- x
+   })
+   
+   Freqs <- data.frame()
+   
+   for(i in seq(length(data))){
+     Freqs <- rbind(Freqs, data[[i]])
+   }
+   
+   Freqs <- Freqs[c(set_names, "freq")]
+   
+   num_sets <- length(keep)
+   
+   if(tolower(aggregate) == "degree" | is.null(order_mat) == TRUE){
+     for(i in 1:nrow(Freqs)){
+       Freqs$degree[i] <- rowSums(Freqs[ i ,1:num_sets])
+     }
+     if(is.null(order_mat) == FALSE) { 
+       order_cols <- c()
+       for(i in 1:length(order_mat)){
+         order_cols[i] <- match(order_mat[i], colnames(Freqs))
+       }
+       
+       for(i in 1:length(order_cols)){
+         logic <- decrease[i]
+         Freqs <- Freqs[order(Freqs[ , order_cols[i]], decreasing = logic), ]
+       }
+     }
+   } else if(tolower(aggregate) == "sets" & is.null(order_mat) == FALSE) {
+     Freqs <- Get_aggregates(Freqs, num_sets, order_mat, cut)
+   } else {
+     stop('Not implemented yet')
+   }
+   #delete rows used to order data correctly. Not needed to set up bars.
+   delete_row <- (num_sets + 2)
+   Freqs <- Freqs[ , -delete_row]
+   for( i in 1:nrow(Freqs)){
+     Freqs$x[i] <- i
+     Freqs$color <- mbar_color
+   }
+   Freqs <- na.omit(Freqs)
+   return(Freqs)
+ }
> assignInNamespace('specific_intersections', specific_intersections, ns= 'UpSetR')
> upset(orthogroups_df, 
+       text.scale = c(1.4),
+       sets=rev(selected_species), 
+       nsets = ncol(orthogroups_df),
+       #keep.order=T, 
+       #mb.ratio=c(0.5,0.5), 
+       #order.by='degree', 
+       order.by='freq',
+       intersections = intersections, 
+       sets.x.label="Total number of orthogroups", 
+       mainbar.y.label = "Number of orthogroups") 
sexyscientist commented 1 year ago

I am not able to maintain set order in any situation. Tried different numbers and orders (degree and freq) of intersects.

ellenelizabethsmith commented 8 months ago

I have also been having this problem. My current workaround is to specify my desired order as the first intersection in the list. I am grouping by degree so can easily just crop it off the final image since it will be at the end. Not ideal, but it works.