nolanlab / citrus

Citrus Development Code
GNU General Public License v3.0
31 stars 20 forks source link

Get cluster assignments for all cells #113

Closed hartmannfj closed 6 years ago

hartmannfj commented 6 years ago

Hi everyone,

I was wondering, is there an easy way to get the cluster assignments for all (clustered) cells e.g. in form of a vector? I found the clusterMembership:List in the citrus.foldClustering which has the same number of entries than cells used in the clustering but I am not sure what to make of it since each list entry is itself a vector.

Anyone has a solution for this? I'd like to have the cells assigned to the clusters as depicted in the usual markerPlots.pdf to do some downstream analysis of them.

Best, Felix

bc2zb commented 6 years ago

I've been playing with this myself actually. I believe each element in citrus.foldClustering$allClustering$clusterMembership contains a vector of the indexes of the events in citrus.combinedFCSSet$data that are a part of that cluster. Below is code I've been using to reverse that. Basically, this should return a list that contains a vector of every cluster a cell is in. It however will take a long time to run, and would love if anyone has more efficient ways to do this. The citrus.exportCluster() function works fine, however, you then have a whole lot of FCS files to deal with.

`myList <- citrus.foldClustering$allClustering$clusterMembership myValue <- 1

extractClusterMembership <- function(myValue, myList){ myVectors <- sapply(1:length(myList), function(i) any(myList[[i]] == myValue)) myIndicies <- which(myVectors) return(myIndicies) }

test <- mclapply(c(1:nrow(citrus.combinedFCSSet$data)), extractClusterMembership, myList = myList, mc.cores = 7) `

rbruggner commented 6 years ago

Hi Felix,

If I understand correctly, you're asking for a single 1-dimensional vector with 1 entry per clustered cell, and each entry corresponds to the cluster that the cell is assigned to. Is that correct?

If so, that form is actually not possible because citrus assigns each cell to multiple clusters.

If you want to understand which cells belong to which clusters, the cluster membership list contains the indicies of the cells in the combinedFCS set that belong to each cluster as @bc2zb mentioned.

If you want the reverse (a list with the number of entries == number of clustered cells and each entry is the list of clusters that each event belongs to), you could give @bc2zb 's code a try. I haven't verified it but would probably use something very similar:

reverseIsElement = function(x,y){
  is.element(y,x)
}
memberOf = function(cellIndex,clusterMembership){
  which(sapply(clusterMembership,reverseIsElement,cellIndex))
}

lapply(1:nrow(citrus.combinedFCSSet$data),
       memberOf,
       clusterMembership = citrus.clustering$clusterMembership)
bc2zb commented 6 years ago

I just realized that I've solved this problem a while ago but never posted the solution. The dplyr family of functions makes this trivial to figure out. Just use the unnest() function.