igraph distances error - Help, please?

Pancreas-Pratik commented 3 years ago

Dear Dr. Chung,

Thank you for fixing the previous issue I was receiving. I am receiving the following issue now:

Error in igraph::distances(g_test, to = base::as.character(final_score$Gene),  : 
  At structural_properties.c:5313 : cannot run Bellman-Ford algorithm, Negative loop detected while calculating shortest paths

I think the root of the error is because I am running org.Hs.eg.db on Bioconductor version: Release (3.13), and not 3.10 where the original analysis was performed.

I am using the data from: https://human-pancreas-dev.cells.ucsc.edu

Here is my workflow:

mat <- read.table("~/Projects/fetal-pancreas-7-10wpc/exprMatrix.tsv", header=T, sep="\t")

library(org.Hs.eg.db)
keytypes(org.Hs.eg.db)

annot <- select(org.Hs.eg.db,
                keys = keys(org.Hs.eg.db),
                columns = c('ENTREZID','SYMBOL','ENSEMBL','ENSEMBLTRANS'),
                keytype = 'ENTREZID')
annot$ENTREZID <- NULL
annot$ENSEMBLTRANS <- NULL

colnames(mat)[1] <- c("ENSEMBL")
mat <- merge(annot, mat, by = "ENSEMBL")
mat$ENSEMBL <- NULL

library(dplyr)
mat <- mat %>%
  mutate(n = rowSums(select(mat, -SYMBOL))) %>% 
  group_by(SYMBOL) %>% 
  slice(which.max(n)) %>% 
  select(-n)
mat <-as.data.frame(mat)
row.names(mat)<- mat[,1]
mat$SYMBOL <- NULL

rm(annot)

meta <- read.table("~/Projects/fetal-pancreas-7-10wpc/meta.tsv", header=T, sep="\t", as.is=T, row.names=1)
anno.tbl <- tibble::rownames_to_column(meta, "cellnames")
anno.tbl$nCount_RNA <- NULL
anno.tbl$nFeature_RNA <- NULL
anno.tbl$phase <- NULL
anno.tbl$age <- NULL

library(InterCom)
InterCom(mat,
         anno.tbl,
         "HUMAN",
         sighot.cutoff=0.1,
         sighot.percentile=70,
         consv.thrs=0.05,
         ncores=16,
         sig.cutoff=0.9,
         z.score.cutoff=2,
         "Pancreas",
         'temp',
         "~/Projects/fetal-pancreas-7-10wpc/InterCom/"
)

Here is my output and the error:

Creating input parameters file

Tissue :  Pancreas 
 Preparing data
Detected populations:
unknown proliferating mesenchyme tip trunk neurons endocrine blood Celltype : unknown
   Finding maximum sum subcluster in expression space
   Starting SigHotSpotter analysis for the sub-cluster identified
   Calculating shortest path weights .
   Saving results
 Celltype : proliferating
   Finding maximum sum subcluster in expression space
   Starting SigHotSpotter analysis for the sub-cluster identified
Error in igraph::distances(g_test, to = base::as.character(final_score$Gene),  : 
  At structural_properties.c:5313 : cannot run Bellman-Ford algorithm, Negative loop detected while calculating shortest paths

Here is my sessionInfo()

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
[1] InterCom_0.1         dplyr_1.0.6          org.Hs.eg.db_3.13.0  AnnotationDbi_1.54.0
[5] IRanges_2.26.0       S4Vectors_0.30.0     Biobase_2.52.0       BiocGenerics_0.38.0 

loaded via a namespace (and not attached):
  [1] Seurat_4.0.2           Rtsne_0.15             colorspace_2.0-1      
  [4] deldir_0.2-10          ellipsis_0.3.2         ggridges_0.5.3        
  [7] XVector_0.32.0         rstudioapi_0.13        spatstat.data_2.1-0   
 [10] leiden_0.3.8           listenv_0.8.0          ggrepel_0.9.1         
 [13] bit64_4.0.5            RSpectra_0.16-0        fansi_0.5.0           
 [16] codetools_0.2-18       splines_4.1.0          cachem_1.0.5          
 [19] knitr_1.33             polyclip_1.10-0        jsonlite_1.7.2        
 [22] ica_1.0-2              cluster_2.1.2          png_0.1-7             
 [25] uwot_0.1.10            shiny_1.6.0            sctransform_0.3.2     
 [28] spatstat.sparse_2.0-0  compiler_4.1.0         httr_1.4.2            
 [31] assertthat_0.2.1       SeuratObject_4.0.1     Matrix_1.3-3          
 [34] fastmap_1.1.0          lazyeval_0.2.2         later_1.2.0           
 [37] htmltools_0.5.1.1      tools_4.1.0            igraph_1.2.6          
 [40] GenomeInfoDbData_1.2.6 gtable_0.3.0           glue_1.4.2            
 [43] RANN_2.6.1             reshape2_1.4.4         Rcpp_1.0.6            
 [46] scattermore_0.7        Biostrings_2.60.0      vctrs_0.3.8           
 [49] nlme_3.1-152           lmtest_0.9-38          xfun_0.23             
 [52] stringr_1.4.0          globals_0.14.0         mime_0.10             
 [55] miniUI_0.1.1.1         lifecycle_1.0.0        irlba_2.3.3           
 [58] goftest_1.2-2          future_1.21.0          zlibbioc_1.38.0       
 [61] MASS_7.3-54            zoo_1.8-9              scales_1.1.1          
 [64] spatstat.core_2.1-2    promises_1.2.0.1       spatstat.utils_2.1-0  
 [67] RColorBrewer_1.1-2     yaml_2.2.1             memoise_2.0.0         
 [70] reticulate_1.20        pbapply_1.4-3          gridExtra_2.3         
 [73] ggplot2_3.3.3          rpart_4.1-15           stringi_1.6.2         
 [76] RSQLite_2.2.7          GenomeInfoDb_1.28.0    bitops_1.0-7          
 [79] rlang_0.4.11           pkgconfig_2.0.3        matrixStats_0.59.0    
 [82] evaluate_0.14          lattice_0.20-44        ROCR_1.0-11           
 [85] purrr_0.3.4            tensor_1.5             patchwork_1.1.1       
 [88] htmlwidgets_1.5.3      cowplot_1.1.1          bit_4.0.4             
 [91] tidyselect_1.1.1       parallelly_1.25.0      RcppAnnoy_0.0.18      
 [94] plyr_1.8.6             magrittr_2.0.1         R6_2.5.0              
 [97] generics_0.1.0         DBI_1.1.1              pillar_1.6.1          
[100] mgcv_1.8-36            fitdistrplus_1.1-5     RCurl_1.98-1.3        
[103] KEGGREST_1.32.0        survival_3.2-11        abind_1.4-5           
[106] tibble_3.1.2           future.apply_1.7.0     crayon_1.4.1          
[109] KernSmooth_2.23-20     utf8_1.2.1             spatstat.geom_2.1-0   
[112] plotly_4.9.3           rmarkdown_2.8          grid_4.1.0            
[115] data.table_1.14.0      blob_1.2.1             taRifx_1.0.6.2        
[118] digest_0.6.27          xtable_1.8-4           tidyr_1.1.3           
[121] httpuv_1.6.1           munsell_0.5.0          viridisLite_0.4.0

Thank you again, and I look forward to your response.

saschajung commented 3 years ago

I looked into this and the issue is not directly related to InterCom, but rather the input data. I realized that the location from where you obtained the data does not provide raw count matrices but only integrated data from Seurat. Therefore, the matrix contains negative values, which results in negative edge weights and leads to the error you're seeing. As a general rule, the integrated data from Seurat should not be used for any quantitative downstream analyses (see here, for instance).

To make it work for you, i'll make the count matrix available together with a function for reproducing the networks. In addition, i'll clarify in the Readme that raw counts should be used.

Pancreas-Pratik commented 3 years ago

Thank you, sir. I was actually going to ask if you could provide the raw count matrix, especially for the fetal pancreas 7-10 wpc data.

For another analysis, I attempted to open the integrated/log-transformed data from https://human-pancreas-dev.cells.ucsc.edu previously on Seurat to explore the dataset, however, after someone from the Seurat team showed me that the data was log-transformed, I learned that I could not do this (without reversing what was done).

Again thank you for everything, Dr. Chung. I look forward to your response, sir.

saschajung / Intercom

igraph distances error - Help, please? #2