omnideconv / immunedeconv

A unified interface to immune deconvolution methods (CIBERSORT, EPIC, quanTIseq, TIMER, xCell, MCPcounter) and mouse deconvolution methods
https://omnideconv.org/immunedeconv/index.html
Other
452 stars 101 forks source link

Error in if (max(Y) < 50) { : missing value where TRUE/FALSE needed - CIBERSORT #90

Open ghost opened 2 years ago

ghost commented 2 years ago

Hi everyone

I'm working with cibersort in R, i downloaded this database from pubmed GSE33814, after some cleaning process I got the matrix with gene names as row names and patients as columms. but at the moment to run cibersort I received this errors:

Error in if (max(Y) < 50) { : missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  EOF within quoted string
2: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  number of items read is not a multiple of the number of columns

My matrix has 25441 rows and 44 colums

I also did the same analysis with other databases (GSE62232 and GSE151158) and I have not problems with them. I don't know what is the problem with this database. and why I get this errors

Any help will be very appreciated

``` R version 4.1.1 (2021-08-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363) Matrix products: default locale: [1] LC_COLLATE=Spanish_Peru.1252 LC_CTYPE=Spanish_Peru.1252 LC_MONETARY=Spanish_Peru.1252 [4] LC_NUMERIC=C LC_TIME=Spanish_Peru.1252 attached base packages: [1] parallel grid stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] janitor_2.1.0 GEOquery_2.62.2 preprocessCore_1.56.0 [4] e1071_1.7-9 VennDiagram_1.7.1 futile.logger_1.4.3 [7] ggvenn_0.1.9 factoextra_1.0.7 fpc_2.2-9 [10] mclust_5.4.9 circlize_0.4.14 ggcorrplot_0.1.3 [13] immunedeconv_2.0.4 EPIC_1.1.5 org.Hs.eg.db_3.14.0 [16] AnnotationDbi_1.56.2 tidyr_1.1.4 europepmc_0.4.1 [19] ggnewscale_0.4.6 enrichplot_1.14.2 pathview_1.34.0 [22] clusterProfiler_4.2.2 ggrepel_0.9.1 ggplot2_3.3.5 [25] purrr_0.3.4 stringr_1.4.0 sva_3.42.0 [28] BiocParallel_1.28.3 genefilter_1.76.0 mgcv_1.8-36 [31] nlme_3.1-152 gtools_3.9.2 DESeq2_1.34.0 [34] SummarizedExperiment_1.24.0 Biobase_2.54.0 MatrixGenerics_1.6.0 [37] matrixStats_0.61.0 GenomicRanges_1.46.1 GenomeInfoDb_1.30.1 [40] IRanges_2.28.0 S4Vectors_0.32.3 BiocGenerics_0.40.0 [43] pipeR_0.6.1.3 readr_2.1.1 tibble_3.1.6 [46] dplyr_1.0.7 limma_3.50.0 loaded via a namespace (and not attached): [1] utf8_1.2.2 R.utils_2.11.0 reticulate_1.24 tidyselect_1.1.1 [5] RSQLite_2.2.9 lpSolve_5.6.15 scatterpie_0.1.7 munsell_0.5.0 [9] umap_0.2.7.0 withr_2.4.3 colorspace_2.0-2 GOSemSim_2.20.0 [13] limSolve_1.5.6 knitr_1.36 rstudioapi_0.13 robustbase_0.93-9 [17] DOSE_3.20.1 KEGGgraph_1.54.0 urltools_1.7.3 GenomeInfoDbData_1.2.7 [21] polyclip_1.10-0 bit64_4.0.5 farver_2.1.0 downloader_0.4 [25] vctrs_0.3.8 treeio_1.18.1 generics_0.1.2 lambda.r_1.2.4 [29] xfun_0.28 diptest_0.76-0 R6_2.5.1 graphlayouts_0.8.0 [33] locfit_1.5-9.4 flexmix_2.3-17 bitops_1.0-7 cachem_1.0.6 [37] fgsea_1.20.0 gridGraphics_0.5-1 DelayedArray_0.20.0 assertthat_0.2.1 [41] vroom_1.5.7 scales_1.1.1 nnet_7.3-16 ggraph_2.0.5 [45] gtable_0.3.0 tidygraph_1.2.0 rlang_0.4.12 GlobalOptions_0.1.2 [49] splines_4.1.1 lazyeval_0.2.2 yaml_2.2.1 reshape2_1.4.4 [53] qvalue_2.26.0 tools_4.1.1 ggplotify_0.1.0 ellipsis_0.3.2 [57] RColorBrewer_1.1-2 proxy_0.4-26 testit_0.13 Rcpp_1.0.8 [61] plyr_1.8.6 progress_1.2.2 zlibbioc_1.40.0 RCurl_1.98-1.5 [65] prettyunits_1.1.1 openssl_1.4.6 viridis_0.6.2 cluster_2.1.2 [69] magrittr_2.0.1 futile.options_1.0.1 data.table_1.14.2 RSpectra_0.16-0 [73] DO.db_2.9 triebeard_0.3.0 hms_1.1.1 patchwork_1.1.1 [77] evaluate_0.14 xtable_1.8-4 XML_3.99-0.8 readxl_1.3.1 [81] shape_1.4.6 gridExtra_2.3 compiler_4.1.1 crayon_1.4.2 [85] shadowtext_0.1.1 R.oo_1.24.0 htmltools_0.5.2 ggfun_0.0.5 [89] tzdb_0.2.0 geneplotter_1.72.0 aplot_0.1.2 lubridate_1.8.0 [93] DBI_1.1.2 formatR_1.11 tweenr_1.0.2 MASS_7.3-54 [97] data.tree_1.0.0 Matrix_1.3-4 cli_3.1.0 R.methodsS3_1.8.1 [101] quadprog_1.5-8 igraph_1.2.11 forcats_0.5.1 pkgconfig_2.0.3 [105] xml2_1.3.3 ggtree_3.2.1 annotate_1.72.0 XVector_0.34.0 [109] snakecase_0.11.0 yulab.utils_0.0.4 digest_0.6.29 graph_1.72.0 [113] Biostrings_2.62.0 rmarkdown_2.11 cellranger_1.1.0 fastmatch_1.1-3 [117] tidytree_0.3.9 edgeR_3.36.0 curl_4.3.2 kernlab_0.9-29 [121] modeltools_0.2-23 lifecycle_1.0.1 jsonlite_1.7.3 viridisLite_0.4.0 [125] askpass_1.1 fansi_0.5.0 pillar_1.7.0 lattice_0.20-44 [129] DEoptimR_1.0-10 KEGGREST_1.34.0 fastmap_1.1.0 httr_1.4.2 [133] survival_3.2-11 GO.db_3.14.0 glue_1.5.1 prabclus_2.3-2 [137] png_0.1-7 bit_4.0.4 Rgraphviz_2.38.0 class_7.3-19 [141] ggforce_0.3.3 stringi_1.7.6 blob_1.2.2 memoise_2.0.1 [145] ape_5.6-2 ```
mlist commented 2 years ago

Could you maybe share you matrix here? I suspect some formatting issue that at least I can not spot from this error alone.

ghost commented 2 years ago

b33814.csv

here is it

Thank you so much !

grst commented 2 years ago

Hi @LuisCanoA,

can you show the code you use to read that matrix and to pass it to immunedeconv? Do other deconvolution methods than CIBERSORT work fine? I'm not yet sure if it's a problem that occurs before running immunedeconv or if it could be an internal problem.

The matrix itself looks fine (it's a bit unusual that it uses ; as separator and , as decimal symbol, but depending on how you read the data it'd be fine).

Best, Gregor

ghost commented 2 years ago

Hi @grst

Thanks for your answer, here the code the I used:

Toget the database from pub med, I created this function,

extractora <- function(x){ 
 require(GEOquery)
  require(dplyr)
  require(Biobase)
  require(janitor)
  a <- GEOquery::getGEO(x,destdir = ".")[[1]]
  expresiones <- a%>%exprs()%>%apply(2,function(x)(log2(x)))
  fenotipos <- a%>%pData()%>%clean_names
  gene_names <- a%>%fData()%>%as.data.frame%>%clean_names%>%
    dplyr::select(c(1,matches("symbol")))
  a <- list(expresiones,fenotipos,gene_names)
   return(a)
}

b33814 <-  extractora("GSE33814")

with tha dataset already created, i used this code to clean and start cibersort:

 b33814[[1]]%>%as.data.frame%>%rownames_to_column("id")%>%
  inner_join(b33814[[3]],by="id")%>%dplyr::select(-id)%>%
  relocate(symbol, .before = 1)%>%distinct(symbol, .keep_all = T)%>%
  column_to_rownames("symbol")%>%
  normalizeBetweenArrays(method="quantile")%>%
  deconvolute(method="cibersort")

Other methods works well, xcell for example... but I dont know what is happening with cibersort.

grst commented 2 years ago

I assume the matrix you sent us is the normalized matrix you pipe into deconvolute?

For cibersort, due to how it is implemented, immunedeconv writes the expression matrix to a temporary file, from where it is read by cibersort. I believe this is where the error occurs, but I have no clue yet how that could possibly fail.

https://github.com/icbi-lab/immunedeconv/blob/c70539f2b08901687561dca755337fc6a5130440/R/immune_deconvolution_methods.R#L245-L250

ghost commented 2 years ago

Hi @grst , exact, that is the matrix that I pipe into deconvolute. I also tried to do without normalization method, but the problem still continue.

grst commented 2 years ago

Sorry, I currently don't have time to look into this, but I'll try to reproduce the problem using the matrix you provided and come back to you!

mlist commented 2 years ago

@LuisCanoA maybe you could try to feed the matrix directly to CIBERSORT to see if the problem is really with immunedeconv or with CIBERSORT. That might speed things up.

aaksingh commented 2 years ago

I had a the same error message while using quantiseq. I had the values in a char dataframe which I then converted to a numeric matrix, due to which NA values were added by coercion. It looks like immune deconv was not able to parse those NA values which produces this error. I fixed the issue by removing all the NA values from my matrix.