xzhoulab / iDEA

Differential expression (DE); gene set Enrichment Analysis (GSEA); single cell RNAseq studies (scRNAseq)
GNU General Public License v3.0
32 stars 11 forks source link

EMMCMCStepSummary error #25

Closed soryung closed 1 year ago

soryung commented 2 years ago

Hi there, I'm encountering the following error when executing the iDEA.fit.

Error in EMMCMCStepSummary(object@summary[, 1], object@summary[, 2], as.matrix(Annot), : pinv(): svd failed

Here is my code.

DEG <- read.table("my_data.csv")
pvalue <- DEG$p_val
zscore <- qnorm(pvalue/2.0, lower.tail=FALSE) #### convert the pvalue to z-score
log2FC <- DEG$avg_log2FC ## effect size
se_beta <- abs(log2FC/zscore)
lfcSE2 = se_beta^2
summary = data.frame(beta = log2FC,beta_var = lfcSE2)
rownames(summary) = rownames(DEG)
> head(summary)
                 beta    beta_var
MGI:1918911 0.2594995 0.008204673
MGI:1913300 0.6050112 0.014540803
MGI:1915609 0.4179491 0.013050508
MGI:1916013 1.0654890 0.053041579
MGI:2152337 0.2732631 0.007393953
MGI:1913456 0.3679529 0.013129402
> nrow(summary)
[1] 2091

data(mouseGeneSets)
> mouseGeneSets[1:3,1:3]
           GO:0000002 GO:0000003 GO:0000009
MGI:101757          0          0          0
MGI:101758          0          0          0
MGI:101759          0          0          0

idea <- CreateiDEAObject(summary, mouseGeneSets, max_var_beta = 100, min_precent_annot = 0.0025, num_core=10)

idea <- iDEA.fit(idea,
                 fit_noGS=FALSE,
             init_beta=NULL, 
             init_tau=c(-2,0.5),
             min_degene=5,
             em_iter=15,
             mcmc_iter=1000, 
             fit.tol=1e-5,
                 modelVariant = F,
             verbose=TRUE)

However, when I try to subset only 4 gene sets, when I create iDEA object. There is no error. (I referenced this page. [https://github.com/xzhoulab/iDEA/issues/21])

idea <- CreateiDEAObject(summary, mouseGeneSets[,c(1,2,13,14)], max_var_beta = 100, min_precent_annot = 0.0025, num_core = 10)
> head(idea@annotation[[1]])
[1]  426  549  552  556  865 1342
> head(idea@annotation[[2]])
[1] 28 33 37 38 57 58
> head(idea@annotation[[3]])
[1]   26   95  191  548  930 1037
> head(idea@annotation[[4]])
[1]  85  90 264 411 586 623

idea <- iDEA.fit(idea,
                 fit_noGS=FALSE,
             init_beta=NULL, 
             init_tau=c(-2,0.5),
             min_degene=5,
             em_iter=15,
             mcmc_iter=1000, 
             fit.tol=1e-5,
                 modelVariant = F,
             verbose=TRUE)

## ===== iDEA INPUT SUMMARY ==== ##
## number of annotations:  4 
## number of genes:  2091 
## number of cores:  10 
## fitting the model with gene sets information... 
  |======================================================================================================================================================================| 100%, Elapsed 00:11

For information, here is my sessionInfo.

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /home/Soryung/.conda/envs/ENS/lib/libopenblasp-r0.3.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.7    biomaRt_2.50.3 iDEA_1.0.1    

loaded via a namespace (and not attached):
 [1] KEGGREST_1.34.0        progress_1.2.2         tidyselect_1.1.1      
 [4] purrr_0.3.4            generics_0.1.1         vctrs_0.3.8           
 [7] doSNOW_1.0.20          snow_0.4-4             stats4_4.1.1          
[10] BiocFileCache_2.2.1    utf8_1.2.2             blob_1.2.2            
[13] XML_3.99-0.8           rlang_0.4.12           pillar_1.6.4          
[16] withr_2.4.2            glue_1.5.0             DBI_1.1.1             
[19] rappdirs_0.3.3         BiocGenerics_0.40.0    bit64_4.0.5           
[22] dbplyr_2.1.1           GenomeInfoDbData_1.2.7 foreach_1.5.2         
[25] lifecycle_1.0.1        stringr_1.4.0          zlibbioc_1.40.0       
[28] Biostrings_2.62.0      codetools_0.2-18       memoise_2.0.0         
[31] Biobase_2.54.0         IRanges_2.28.0         fastmap_1.1.0         
[34] doParallel_1.0.17      GenomeInfoDb_1.30.0    curl_4.3.2            
[37] parallel_4.1.1         fansi_0.5.0            AnnotationDbi_1.56.2  
[40] pbmcapply_1.5.1        Rcpp_1.0.8.3           filelock_1.0.2        
[43] cachem_1.0.6           S4Vectors_0.32.2       XVector_0.34.0        
[46] bit_4.0.4              hms_1.1.1              png_0.1-7             
[49] digest_0.6.28          stringi_1.7.5          tools_4.1.1           
[52] bitops_1.0-7           magrittr_2.0.1         tibble_3.1.6          
[55] RCurl_1.98-1.5         RSQLite_2.2.10         crayon_1.4.2          
[58] pkgconfig_2.0.3        ellipsis_0.3.2         xml2_1.3.2            
[61] prettyunits_1.1.1      assertthat_0.2.1       httr_1.4.2            
[64] iterators_1.0.14       R6_2.5.1               compiler_4.1.1  

Any idea what's causing this error? Thanks!

YingMa0107 commented 2 years ago

Hi @soryung,

Thank you for your interest in our package.

I'm wondering why your summary statistics only have 2091 genes? If you use Seurat to calculate the summary statistics, you might need to specify the arguments to output all the genes' summary statistics. Details see my previous answer: https://github.com/xzhoulab/iDEA/issues/14.

For the error, it is from the pseudo-inverse of the variance estimation of the annotation, possibly due to the estimation of annotation coefficient is not correct when you estimate for all the gene sets but with only 2091 genes. Could you please try to pull out the summary statistics for all genes in the data not just the DE genes, then try to see if the error still exists.

Thank you!

Best, Ying