sqjin / CellChat

R toolkit for inference, visualization and analysis of cell-cell communication from single-cell data
GNU General Public License v3.0
636 stars 145 forks source link

function netEmbedding is unable to work #167

Open QueenieBear opened 3 years ago

QueenieBear commented 3 years ago

@sqjin I am working on CellChat-vignette in your tutorial. when I run cellchat <- netEmbedding(cellchat, type = "functional"), it reportsError in runUMAP(Similarity, min.dist = 0.3, n.neighbors = k) : Cannot find UMAP, please install through pip (e.g. pip install umap-learn or reticulate::py_install(packages = 'umap-learn')).

then I find this tutorial to install umap-learn via annaconda(I have tried pip3 install umap-learn,and it reports successfully installed,but still can not work in R when it comes to cellchat <- netEmbedding(cellchat, type = "functional")) https://hbctraining.github.io/scRNA-seq/lessons/umap-installation.html

conda install -c conda-forge umap-learn also successfully done. then I runlibrary(reticulate) use_python(python = "~/Anaconda3/bin/python", required = TRUE) cellchat <- netEmbedding(cellchat, type = "functional") and there is no error, but a warningManifold learning of the signaling networks for a single dataset /Users/yanni_zong/anaconda3/lib/python3.8/site-packages/umap/umap_.py:132: UserWarning: A large number of your vertices were disconnected from the manifold. Disconnection_distance = 1 has removed 142 edges. It has fully disconnected 3 vertices. You might consider using find_disconnected_points() to find and remove these points from your data. Use umap.utils.disconnected_vertices() to identify them. warn( However, when I run the next line cellchat <- netClustering(cellchat, type = "functional"),it showsClassification learning of the signaling networks for a single dataset Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)

here is the my session infomation `R version 4.0.3 (2020-10-10) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 10.16

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] reticulate_1.18 ggalluvial_0.12.3 NMF_0.23.0 cluster_2.1.1
[5] rngtools_1.5 pkgmaker_0.32.2 registry_0.5-1 patchwork_1.1.1
[9] CellChat_1.0.0 ggplot2_3.3.3 igraph_1.2.6 dplyr_1.0.5
[13] Biobase_2.50.0 BiocGenerics_0.36.0

loaded via a namespace (and not attached): [1] colorspace_2.0-0 rjson_0.2.20 ellipsis_0.3.1 rprojroot_2.0.2
[5] circlize_0.4.12 GlobalOptions_0.1.2 fs_1.5.0 clue_0.3-58
[9] listenv_0.8.0 remotes_2.2.0 ggrepel_0.9.1 RSpectra_0.16-0
[13] fansi_0.4.2 codetools_0.2-18 doParallel_1.0.16 cachem_1.0.4
[17] knitr_1.31 pkgload_1.2.0 jsonlite_1.7.2 Cairo_1.5-12.2
[21] gridBase_0.4-7 png_0.1-7 BiocManager_1.30.10 compiler_4.0.3
[25] assertthat_0.2.1 Matrix_1.3-2 fastmap_1.1.0 cli_2.3.1
[29] prettyunits_1.1.1 tools_4.0.3 coda_0.19-4 gtable_0.3.0
[33] glue_1.4.2 reshape2_1.4.4 tinytex_0.30 Rcpp_1.0.6
[37] rle_0.9.2 statnet.common_4.4.1 vctrs_0.3.6 svglite_2.0.0
[41] iterators_1.0.13 xfun_0.21 stringr_1.4.0 globals_0.14.0
[45] ps_1.6.0 network_1.16.1 testthat_3.0.2 lifecycle_1.0.0
[49] irlba_2.3.3 devtools_2.3.2 future_1.21.0 scales_1.1.1
[53] gg.gap_1.3 RColorBrewer_1.1-2 ComplexHeatmap_2.6.2 memoise_2.0.0
[57] pbapply_1.4-3 stringi_1.5.3 S4Vectors_0.28.1 desc_1.3.0
[61] foreach_1.5.1 pkgbuild_1.2.0 shape_1.4.5 rlang_0.4.10
[65] pkgconfig_2.0.3 systemfonts_1.0.1 matrixStats_0.58.0 lattice_0.20-41
[69] purrr_0.3.4 cowplot_1.1.1 tidyselect_1.1.0 processx_3.4.5
[73] parallelly_1.24.0 plyr_1.8.6 magrittr_2.0.1 R6_2.5.0
[77] IRanges_2.24.1 generics_0.1.0 sna_2.6 DBI_1.1.1
[81] pillar_1.5.1 withr_2.4.1 tibble_3.1.0 future.apply_1.7.0
[85] crayon_1.4.1 utf8_1.2.1 GetoptLong_1.0.5 usethis_2.0.1
[89] grid_4.0.3 FNN_1.1.3 callr_3.5.1 digest_0.6.27
[93] xtable_1.8-4 stats4_4.0.3 munsell_0.5.0 sessioninfo_1.1.1 `

what can I do next?😭

QueenieBear commented 3 years ago

add info: it surprisingly works well with cellchat <- computeNetSimilarity(cellchat, type = "structural") cellchat <- netEmbedding(cellchat, type = "structural") cellchat <- netClustering(cellchat, type = "structural") but this function still does not work cellchat <- netEmbedding(cellchat, type = "functional") Manifold learning of the signaling networks for a single dataset /Users/yannizong/Library/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/umap/umap.py:133: UserWarning: A large number of your vertices were disconnected from the manifold. Disconnection_distance = 1 has removed 142 edges. It has fully disconnected 3 vertices. You might consider using find_disconnected_points() to find and remove these points from your data. Use umap.utils.disconnected_vertices() to identify them. f"A large number of your vertices were disconnected from the manifold.\n" cellchat <- netClustering(cellchat, type = "functional") Classification learning of the signaling networks for a single dataset Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1) netVisual_embedding(cellchat,type = "functional", label.size = 3.5) Error in data.frame(x = Y[, 1], y = Y[, 2], Commun.Prob. = prob_sum/max(prob_sum), : arguments imply differing number of rows: 13, 0

sqjin commented 3 years ago

@QueenieBear Can you please run the source codes of netVisual_embedding to figure out the reason?

lxwang326 commented 3 years ago

I have the same questions with QueenieBear

lxwang326 commented 3 years ago

I read the pull request and add the code in the source code as follows: #1#2#3

netClustering <- function(object, slot.name = "netP", type = c("functional","structural"), k = NULL, methods = "kmeans", do.parallel = TRUE, nCores = 4, k.eigen = NULL) {
  type <- match.arg(type)

  comparison.name <- paste(comparison, collapse = "-")#1
  Y <- methods::slot(object, slot.name)$similarity[[type]]$dr#[[comparison.name]]#2
  Y[is.na(Y)] <- 0#3

error is as follows

> cellchat <- netClustering(cellchat, type = "functional")
Error in paste(comparison, collapse = "-") : 找不到对象'comparison'

how can I define 'comparison'? thanks for your help!

lxwang326 commented 3 years ago

I have solved the problem by the following source code in the pull request。

> computeNetSimilarity <- function(object, slot.name = "netP", type = c("functional","structural"), k = NULL, thresh = NULL) {
+   type <- match.arg(type)
+   prob = methods::slot(object, slot.name)$prob
+   if (is.null(k)) {
+     if (dim(prob)[3] <= 25) {
+       k <- ceiling(sqrt(dim(prob)[3]))
+     } else {
+       k <- ceiling(sqrt(dim(prob)[3])) + 1
+     }
+   }
+   if (!is.null(thresh)) {
+     prob[prob < quantile(c(prob[prob != 0]), thresh)] <- 0
+   }
+   if (type == "functional") {
+     # compute the functional similarity
+     D_signalings <- matrix(0, nrow = dim(prob)[3], ncol = dim(prob)[3])
+     S2 <- D_signalings; S3 <- D_signalings;
+     for (i in 1:(dim(prob)[3]-1)) {
+       for (j in (i+1):dim(prob)[3]) {
+         Gi <- (prob[ , ,i] > 0)*1
+         Gj <- (prob[ , ,j] > 0)*1
+         S3[i,j] <- sum(Gi * Gj)/sum(Gi+Gj-Gi*Gj,na.rm=TRUE)
+       }
+     }
+     # define the similarity matrix
+     S3[is.na(S3)] <- 0; S3 <- S3 + t(S3); diag(S3) <- 1
+     # S_signalings <- S1 *S2
+     S_signalings <- S3
+   } else if (type == "structural") {
+     # compute the structure distance
+     D_signalings <- matrix(0, nrow = dim(prob)[3], ncol = dim(prob)[3])
+     for (i in 1:(dim(prob)[3]-1)) {
+       for (j in (i+1):dim(prob)[3]) {
+         Gi <- (prob[ , ,i] > 0)*1
+         Gj <- (prob[ , ,j] > 0)*1
+         D_signalings[i,j] <- computeNetD_structure(Gi,Gj)
+       }
+     }
+     # define the structure similarity matrix
+     D_signalings[is.infinite(D_signalings)] <- 0
+     D_signalings[is.na(D_signalings)] <- 0
+     D_signalings <- D_signalings + t(D_signalings)
+     S_signalings <- 1-D_signalings
+   }
+   # smooth the similarity matrix using SNN
+   SNN <- buildSNN(S_signalings, k = k, prune.SNN = 1/15)
+   Similarity <- as.matrix(S_signalings*SNN)
+   rownames(Similarity) <- dimnames(prob)[[3]]
+   colnames(Similarity) <- dimnames(prob)[[3]]
+   comparison <- "single"
+   comparison.name <- paste(comparison, collapse = "-")
+   if (!is.list(methods::slot(object, slot.name)$similarity[[type]]$matrix)) {
+     methods::slot(object, slot.name)$similarity[[type]]$matrix <- NULL
+   }
+   methods::slot(object, slot.name)$similarity[[type]]$matrix[[comparison.name]] <- Similarity
+   return(object)
+ }
> #   报错 找到pullrequest 修改后源代码运行 也不对 逐个函数运行
> cellchat <- computeNetSimilarity(cellchat, type = "functional")
> #' @param slot.name the slot name of object that is used to compute centrality measures of signaling networks
> #' @param type "functional","structural"
> #' @param comparison a numerical vector giving the datasets for comparison
> #' @param k the number of nearest neighbors
> #' @param thresh the fraction (0 to 0.25) of interactions to be trimmed before computing network similarity
> #' @importFrom methods slot
> #'
> #' @return
> #' @export
> #'
> computeNetSimilarityPairwise <- function(object, slot.name = "netP", type = c("functional","structural"), comparison = NULL, k = NULL, thresh = NULL) {
+   type <- match.arg(type)
+   if (is.null(comparison)) {
+     comparison <- 1:length(unique(object@meta$datasets))
+   }
+   cat("Compute signaling network similarity for datasets", as.character(comparison), '\n')
+   comparison.name <- paste(comparison, collapse = "-")
+   net <- list()
+   signalingAll <- c()
+   object.net.nameAll <- c()
+   # 1:length(setdiff(names(methods::slot(object, slot.name)), "similarity"))
+   for (i in 1:length(comparison)) {
+     object.net <- methods::slot(object, slot.name)[[comparison[i]]]
+     object.net.name <- names(methods::slot(object, slot.name))[comparison[i]]
+     object.net.nameAll <- c(object.net.nameAll, object.net.name)
+     net[[i]] = object.net$prob
+     signalingAll <- c(signalingAll, paste0(dimnames(net[[i]])[[3]], "--", object.net.name))
+     # signalingAll <- c(signalingAll, dimnames(net[[i]])[[3]])
+   }
+   names(net) <- object.net.nameAll
+   net.dim <- sapply(net, dim)[3,]
+   nnet <- sum(net.dim)
+   position <- cumsum(net.dim); position <- c(0,position)
+   if (is.null(k)) {
+     if (nnet <= 25) {
+       k <- ceiling(sqrt(nnet))
+     } else {
+       k <- ceiling(sqrt(nnet)) + 1
+     }
+   }
+   if (!is.null(thresh)) {
+     for (i in 1:length(net)) {
+       neti <- net[[i]]
+       neti[neti < quantile(c(neti[neti != 0]), thresh)] <- 0
+       net[[i]] <- neti
+     }
+   }
+   if (type == "functional") {
+     # compute the functional similarity
+     S3 <- matrix(0, nrow = nnet, ncol = nnet)
+     for (i in 1:nnet) {
+       for (j in 1:nnet) {
+         idx.i <- which(position - i >= 0)[1]
+         idx.j <- which(position - j >= 0)[1]
+         net.i <- net[[idx.i-1]]
+         net.j <- net[[idx.j-1]]
+         Gi <- (net.i[ , ,i-position[idx.i-1]] > 0)*1
+         Gj <- (net.j[ , ,j-position[idx.j-1]] > 0)*1
+         S3[i,j] <- sum(Gi * Gj)/sum(Gi+Gj-Gi*Gj,na.rm=TRUE)
+       }
+     }
+     # define the similarity matrix
+     S3[is.na(S3)] <- 0;  diag(S3) <- 1
+     S_signalings <- S3
+   } else if (type == "structural") {
+     # compute the structure distance
+     D_signalings <- matrix(0, nrow = nnet, ncol = nnet)
+     for (i in 1:nnet) {
+       for (j in 1:nnet) {
+         idx.i <- which(position - i >= 0)[1]
+         idx.j <- which(position - j >= 0)[1]
+         net.i <- net[[idx.i-1]]
+         net.j <- net[[idx.j-1]]
+         Gi <- (net.i[ , ,i-position[idx.i-1]] > 0)*1
+         Gj <- (net.j[ , ,j-position[idx.j-1]] > 0)*1
+         D_signalings[i,j] <- computeNetD_structure(Gi,Gj)
+       }
+     }
+     # define the structure similarity matrix
+     D_signalings[is.infinite(D_signalings)] <- 0
+     D_signalings[is.na(D_signalings)] <- 0
+     S_signalings <- 1-D_signalings
+   }
+   # smooth the similarity matrix using SNN
+   SNN <- buildSNN(S_signalings, k = k, prune.SNN = 1/15)
+   Similarity <- as.matrix(S_signalings*SNN)
+   rownames(Similarity) <- signalingAll
+   colnames(Similarity) <- rownames(Similarity)
+   if (!is.list(methods::slot(object, slot.name)$similarity[[type]]$matrix)) {
+     methods::slot(object, slot.name)$similarity[[type]]$matrix <- NULL
+   }
+   # methods::slot(object, slot.name)$similarity[[type]]$matrix <- Similarity
+   methods::slot(object, slot.name)$similarity[[type]]$matrix[[comparison.name]] <- Similarity
+   return(object)
+ }
> #' @param slot.name the slot name of object that is used to compute centrality measures of signaling networks
> #' @param type "functional","structural"
> #' @param comparison a numerical vector giving the datasets for comparison. No need to define for a single dataset. Default are all datasets when object is a merged object
> #' @param k the number of nearest neighbors in running umap
> #' @param pathway.remove a range of the number of patterns
> #' @importFrom methods slot
> #' @return
> #' @export
> #'
> #' @examples
> netEmbedding <- function(object, slot.name = "netP", type = c("functional","structural"), comparison = NULL, pathway.remove = NULL, k = NULL) {
+   if (object@options$mode == "single") {
+     comparison <- "single"
+     cat("Manifold learning of the signaling networks for a single dataset", '\n')
+   } else if (object@options$mode == "merged") {
+     if (is.null(comparison)) {
+       comparison <- 1:length(unique(object@meta$datasets))
+     }
+     cat("Manifold learning of the signaling networks for datasets", as.character(comparison), '\n')
+   }
+   comparison.name <- paste(comparison, collapse = "-")
+   Similarity <- methods::slot(object, slot.name)$similarity[[type]]$matrix[[comparison.name]]
+   if (is.null(pathway.remove)) {
+     pathway.remove <- rownames(Similarity)[which(colSums(Similarity) == 1)]
+   }
+   if (length(pathway.remove) > 0) {
+     pathway.remove.idx <- which(rownames(Similarity) %in% pathway.remove)
+     Similarity <- Similarity[-pathway.remove.idx, -pathway.remove.idx]
+   }
+   if (is.null(k)) {
+     k <- ceiling(sqrt(dim(Similarity)[1])) + 1
+   }
+   options(warn = -1)
+   # dimension reduction
+   Y <- runUMAP(Similarity, min.dist = 0.3, n.neighbors = k)
+   if (!is.list(methods::slot(object, slot.name)$similarity[[type]]$dr)) {
+     methods::slot(object, slot.name)$similarity[[type]]$dr <- NULL
+   }
+   methods::slot(object, slot.name)$similarity[[type]]$dr[[comparison.name]] <- Y
+   return(object)
+ }
> cellchat <- netEmbedding(cellchat, type = "functional")
Manifold learning of the signaling networks for a single dataset 
C:\Users\zzu\AppData\Local\R-MINI~1\envs\R-RETI~1\lib\site-packages\umap\umap_.py:133: UserWarning: A large number of your vertices were disconnected from the manifold.
Disconnection_distance = 1 has removed 142 edges.
It has fully disconnected 3 vertices.
You might consider using find_disconnected_points() to find and remove these points from your data.
Use umap.utils.disconnected_vertices() to identify them.
  f"A large number of your vertices were disconnected from the manifold.\n"
> #' @param nCores number of workers when doing parallel
> #' @param k.eigen the number of eigenvalues used when doing spectral clustering
> #' @importFrom methods slot
> #' @importFrom future nbrOfWorkers plan
> #' @importFrom future.apply future_sapply
> #' @importFrom pbapply pbsapply
> #' @return
> #' @export
> #'
> #' @examples
> netClustering <- function(object, slot.name = "netP", type = c("functional","structural"), comparison = NULL, k = NULL, methods = "kmeans", do.plot = TRUE, fig.id = NULL, do.parallel = TRUE, nCores = 4, k.eigen = NULL) {
+   type <- match.arg(type)
+   if (object@options$mode == "single") {
+     comparison <- "single"
+     cat("Classification learning of the signaling networks for a single dataset", '\n')
+   } else if (object@options$mode == "merged") {
+     if (is.null(comparison)) {
+       comparison <- 1:length(unique(object@meta$datasets))
+     }
+     cat("Classification learning of the signaling networks for datasets", as.character(comparison), '\n')
+   }
+   comparison.name <- paste(comparison, collapse = "-")
+   
+   Y <- methods::slot(object, slot.name)$similarity[[type]]$dr[[comparison.name]]
+   Y[is.na(Y)] <- 0
+   data.use <- Y
+   if (methods == "kmeans") {
+     if (!is.null(k)) {
+       clusters = kmeans(data.use,k,nstart=10)$cluster
+     } else {
+       N <- nrow(data.use)
+       kRange <- seq(2,min(N-1, 10),by = 1)
+       if (do.parallel) {
+         future::plan("multiprocess", workers = nCores)
+         options(future.globals.maxSize = 1000 * 1024^2)
+       }
+       my.sapply <- ifelse(
+         test = future::nbrOfWorkers() == 1,
+         yes = pbapply::pbsapply,
+         no = future.apply::future_sapply
+       )
+       results = my.sapply(
+         X = 1:length(kRange),
+         FUN = function(x) {
+           idents <- kmeans(data.use,kRange[x],nstart=10)$cluster
+           clusIndex <- idents
+           #adjMat0 <- as.numeric(outer(clusIndex, clusIndex, FUN = "==")) - outer(1:N, 1:N, "==")
+           adjMat0 <- Matrix::Matrix(as.numeric(outer(clusIndex, clusIndex, FUN = "==")), nrow = N, ncol = N)
+           return(list(adjMat = adjMat0, ncluster = length(unique(idents))))
+         },
+         simplify = FALSE
+       )
+       adjMat <- lapply(results, "[[", 1)
+       CM <- Reduce('+', adjMat)/length(kRange)
+       res <- computeEigengap(as.matrix(CM))
+       numCluster <- res$upper_bound
+       clusters = kmeans(data.use,numCluster,nstart=10)$cluster
+       if (do.plot) {
+         gg <- res$gg.obj
+         ggsave(filename= paste0("estimationNumCluster_",fig.id,"_",type,"_dataset_",comparison.name,".pdf"), plot=gg, width = 3.5, height = 3, units = 'in', dpi = 300)
+       }
+     }
+   } else if (methods == "spectral") {
+     A <- as.matrix(data.use)
+     D <- apply(A, 1, sum)
+     L <- diag(D)-A                       # unnormalized version
+     L <- diag(D^-0.5)%*%L%*% diag(D^-0.5) # normalized version
+     evL <- eigen(L,symmetric=TRUE)  # evL$values is decreasing sorted when symmetric=TRUE
+     # pick the first k first k eigenvectors (corresponding k smallest) as data points in spectral space
+     plot(rev(evL$values)[1:30])
+     Z <- evL$vectors[,(ncol(evL$vectors)-k.eigen+1):ncol(evL$vectors)]
+     clusters = kmeans(Z,k,nstart=20)$cluster
+   }
+   if (!is.list(methods::slot(object, slot.name)$similarity[[type]]$group)) {
+     methods::slot(object, slot.name)$similarity[[type]]$group <- NULL
+   }
+   methods::slot(object, slot.name)$similarity[[type]]$group[[comparison.name]] <- clusters
+   return(object)
+ }
> #> Manifold learning of the signaling networks for a single dataset
> cellchat <- netClustering(cellchat, type = "functional")
Classification learning of the signaling networks for a single dataset 
> #> Classification learning of the signaling networks for a single dataset
> # Visualization in 2D-space
> netVisual_embedding(cellchat, type = "functional", label.size = 3.5)
apelonero-GladstoneInstitutes commented 3 years ago

@lxwang326 's code got me past the same netClustering() issue: Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)

I took the liberty of wrapping these fixed functions in a script. Just change the extension from .txt to .R and source it to get going again. FYI I've done no due diligence with this, so look this over if you want to be careful. Elaboration, explanation, or better suggestions are very welcome.

Consider this a stopgap? Oh, and some of these functions might throw errors btw. I hastily edited this to get my analysis running again.

CellChat_issue167_netClusteringFix.txt

dsb66 commented 3 years ago

The fix above work for me. Note that the fixed file ("CellChat_issue167_netClusteringFix.txt") has two occurrences of GiGj that should be Gi + Gj.

Manikgarg commented 3 years ago

For me the solution provided by @lxwang326 works after changing Y <- runUMAP(Similarity, min.dist = 0.3, n.neighbors = k) to Y <- runUMAP(Similarity, min_dist = 0.3, n_neighbors = k) in function netEmbedding.

qiuli848 commented 3 years ago

The fix above work for me, too. Also note that the fixed file ("CellChat_issue167_netClusteringFix.txt") has two occurrences of i1 that should be i + 1.