saschajung / Intercom

Creative Commons Attribution 4.0 International
0 stars 0 forks source link

Error from mainlib.r #3

Closed Pazuzzilla closed 3 years ago

Pazuzzilla commented 3 years ago

Goodmorning, I'm trying to use InterCom on a dataset of Normal and Tumor pancreatic organoid, my data were previously managed with Seurat and stored in a seurat object i called sc_484_filtered. Following the Github instruction i ended up with this InterCom invocation InterCom(data = GetAssay(sc_484_filtered)@data, anno.tbl = cbind(rownames(sc_484_filtered@meta.data["type"]),sc_484_filtered@meta.data["type"]), species= "HUMAN", sighot.cutoff=0.1, sighot.percentile=70, consv.thrs=0.05, ncores=4, sig.cutoff=0.9, z.score.cutoff=2, tissue.name = "organoid", temp.folder.name = "Intercom", out.path = "./InterCom" )

In this way i passed as data the expression values normalized and as anno.tbl a dataframe with in the first column the cell names and in the second the type (normal or tumor).

I obtained the following errors:

  1. | stop("'x' must be an array of at least two dimensions")

  2. | base::rowSums(cell.gene.exp)

  3. | FUN(X[[i]], ...)

  4. base::lapply(X = all.pops, FUN = function(celltype1) { cell.gene.exp <- gene.exp.tbl[, base::which(base::grepl(x = base::colnames(gene.exp.tbl), pattern = base::paste0("^", celltype1, "[\.0-9]*$"), ignore.case = F))] ...

  5. | base::do.call(base::rbind, base::lapply(X = all.pops, FUN = function(celltype1) { cell.gene.exp <- gene.exp.tbl[, base::which(base::grepl(x = base::colnames(gene.exp.tbl), pattern = base::paste0("^", celltype1, "[\.0-9]*$"), ignore.case = F))] ...

  6. | get.gene.expr(exp.tbl = data, genes = base::intersect(.pck_env$Ligands, base::rownames(data)), cell.type = all.pops)

  7. | InterCom(data = GetAssay(sc_484_filtered)@data, anno.tbl = cbind(rownames(sc_484_filtered@meta.data["type"]), sc_484_filtered@meta.data["type"]), species = "HUMAN", sighot.cutoff = 0.1, sighot.percentile = 70, consv.thrs = 0.05, ncores = 4, sig.cutoff = 0.9, z.score.cutoff = 2, tissue.name = "Monocolture_organoid", ...

Tracing the errors i ended up finding the source of the error in row 40 of the file mainlib.r: cell.gene.exp <- gene.exp.tbl[,base::which(base::grepl(x = base::colnames(gene.exp.tbl),pattern = base::paste0("^",celltype1,"[\\.0-9]*$"),ignore.case = F))]

which produce for example in the case of "tumor" a 311 x 0 sparse Matrix of class "dgCMatrix" on which is not possible to call rowSums.

It's a possible error due to my file format? here the file i used in the invocation https://www.dropbox.com/sh/lngk0gor3uav6lr/AAAUKOFcGC1fKieBsMOVQXC0a?dl=0

my session info : `> sessionInfo() R version 4.1.0 (2021-05-18) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] ggplot2_3.3.3 InterCom_0.1 SeuratObject_4.0.1 Seurat_4.0.1
[5] SingleCellExperiment_1.14.1 SummarizedExperiment_1.22.0 Biobase_2.52.0 GenomicRanges_1.44.0
[9] GenomeInfoDb_1.28.0 IRanges_2.26.0 S4Vectors_0.30.0 BiocGenerics_0.38.0
[13] MatrixGenerics_1.4.0 matrixStats_0.59.0

loaded via a namespace (and not attached): [1] Rtsne_0.15 colorspace_2.0-1 deldir_0.2-10 ellipsis_0.3.2 ggridges_0.5.3
[6] XVector_0.32.0 spatstat.data_2.1-0 farver_2.1.0 leiden_0.3.8 listenv_0.8.0
[11] ggrepel_0.9.1 fansi_0.5.0 codetools_0.2-18 splines_4.1.0 knitr_1.33
[16] polyclip_1.10-0 jsonlite_1.7.2 ica_1.0-2 cluster_2.1.2 png_0.1-7
[21] uwot_0.1.10 shiny_1.6.0 sctransform_0.3.2 spatstat.sparse_2.0-0 compiler_4.1.0
[26] httr_1.4.2 Matrix_1.3-3 fastmap_1.1.0 lazyeval_0.2.2 later_1.2.0
[31] htmltools_0.5.1.1 tools_4.1.0 igraph_1.2.6 gtable_0.3.0 glue_1.4.2
[36] GenomeInfoDbData_1.2.6 RANN_2.6.1 reshape2_1.4.4 dplyr_1.0.6 Rcpp_1.0.6
[41] scattermore_0.7 vctrs_0.3.8 nlme_3.1-152 lmtest_0.9-38 xfun_0.23
[46] stringr_1.4.0 globals_0.14.0 mime_0.10 miniUI_0.1.1.1 lifecycle_1.0.0
[51] irlba_2.3.3 goftest_1.2-2 future_1.21.0 MASS_7.3-54 zlibbioc_1.38.0
[56] zoo_1.8-9 scales_1.1.1 spatstat.core_2.1-2 promises_1.2.0.1 spatstat.utils_2.1-0
[61] RColorBrewer_1.1-2 yaml_2.2.1 reticulate_1.20 pbapply_1.4-3 gridExtra_2.3
[66] rpart_4.1-15 stringi_1.6.2 rlang_0.4.11 pkgconfig_2.0.3 bitops_1.0-7
[71] lattice_0.20-44 ROCR_1.0-11 purrr_0.3.4 tensor_1.5 labeling_0.4.2
[76] patchwork_1.1.1 htmlwidgets_1.5.3 cowplot_1.1.1 tidyselect_1.1.1 parallelly_1.25.0
[81] RcppAnnoy_0.0.18 plyr_1.8.6 magrittr_2.0.1 R6_2.5.0 generics_0.1.0
[86] DelayedArray_0.18.0 withr_2.4.2 pillar_1.6.1 mgcv_1.8-35 fitdistrplus_1.1-5
[91] survival_3.2-11 abind_1.4-5 RCurl_1.98-1.3 tibble_3.1.2 future.apply_1.7.0
[96] crayon_1.4.1 KernSmooth_2.23-20 utf8_1.2.1 spatstat.geom_2.1-0 plotly_4.9.4
[101] grid_4.1.0 data.table_1.14.0 taRifx_1.0.6.2 digest_0.6.27 xtable_1.8-4
[106] tidyr_1.1.3 httpuv_1.6.1 munsell_0.5.0 viridisLite_0.4.0

`

Let me know if you need other detail on the problem.

saschajung commented 3 years ago

Hey, the error you are seeing is due to your input. In particular, the "data" argument of Intercom requires a data.frame or matrix argument. The rowSums function in the "base" package cannot cope with dgCMatrix objects, which then results in the error you described (i could reproduce it with the information you provided). Although it would be nice if Intercom would be able to handle dgCMatrix objects, especially with regard to larger scRNA-seq datasets, it's currently not at the top of my list to implement it.

Although unrelated to your actual question, i'd like to make two remarks. First, since i don't know what cells your dataset actually contains, i just wanted to mention that you would only receive interactions between two populations, namely tumor and normal, with the current annotation table. This might not work very well if each population is composed of multiple different cell types, since the intracellular signaling cascades, expressed TFs, receptors and secreted ligands could be vastly different. Especially since you mentioned that you are working with pancreatic organoid data, i have the feeling that this could become an issue, if you want to obtain meaningful information. The second remark i wanted to make is about the data input. Apart from what i mentioned above, i saw that the maximum expression in your dataset is around 8.5, which makes me wonder whether you applied any transformation. When running Intercom, i would advise you to take either raw counts (if the data comes from UMI-based technologies) or TPM values (if the data comes from full-length transcript based technologies).

Since i believe this issue has been addressed i am closing it. If you spot other errors, please open an independent issue.