xzhoulab / iDEA

Differential expression (DE); gene set Enrichment Analysis (GSEA); single cell RNAseq studies (scRNAseq)
GNU General Public License v3.0
32 stars 11 forks source link

Error in EMMCMCStepSummary #20

Closed ch8316f5eyu closed 2 years ago

ch8316f5eyu commented 2 years ago

HI, I encountered an error when performing iDEA.fit. There is my code:

markers  = FindAllMarkers(seurat_obj, test.use = 'DESeq2' 
x = subset(markers, cluster == 4)
pvalue <- x$p_val#### the pvalue column
zscore <- qnorm(pvalue/2.0, lower.tail=FALSE) #### convert the pvalue to z-score
beta <- x$avg_log2FC## effect size
se_beta <- abs(beta/zscore) ## to approximate the standard error of beta
beta_var = se_beta^2  ### square
summary = data.frame(beta = beta,beta_var = beta_var)
rownames(summary) = x$gene ### or the gene id column in the res_DE results
library(iDEA)
data(humanGeneSets)
dim(humanGeneSets)
x = read.gmt('/mnt/Data/weiyu/ref/GSEA/c2.all.v7.5.1.symbols.gmt')
Gene_set = zwy_matrix(unique(unlist(x)), names(x))
Gene_set[is.na(Gene_set)] = 0
for (i in names(x)) {
    Gene_set[x[[i]], i] = 1
}
idea <- CreateiDEAObject(summary, Gene_set, max_var_beta = 100, min_precent_annot = 0.0025, num_core=10)
idea <- iDEA.fit(idea, fit_noGS=FALSE, init_beta=NULL, init_tau=c(-2,0.5), min_degene=5, em_iter=15, mcmc_iter=1000, fit.tol=1e-5, modelVariant = F, verbose=TRUE)

And the error is:

rror in EMMCMCStepSummary(object@summary[, 1], object@summary[, 2], as.matrix(Annot),  :
  pinv(): svd failed

Thanks!

YingMa0107 commented 2 years ago

Hi @ch8316f5eyu,

Thank you for your interest in our package. Based on your code, I am wondering what is the zwy_matrix() here? Could you please check if the Gene_set matrix is in the correct format matched with the annotation matrix in the tutorial.

Here is an example on how to create a gene set matrix based on my previous answer to other users https://github.com/xzhoulab/iDEA/issues/16:

#### load the library
library(qusage)
#### read in the gmt for mated gene list, e.g., c2.cp.v7.4.symbols.gmt
exampleSet = read.gmt("./c2.cp.v7.4.symbols.gmt")
#### All genes you want to analyzed, here I use all genes that exist in the c2 pathway as an example. I recommend you to use the union of the genes in your summary statistics and the gene sets you want to analyzed. 
AllGenes = unique(unlist(exampleSet))
print(length(AllGenes))
## 13351
#### make the annotation
Annotation = sapply(1:length(exampleSet),function(iSet){
##### if the gene is included in the ith set 
as.numeric(AllGenes %in% exampleSet[[iSet]])})
#### give the column names and row names of the annotation
rownames(Annotation) = AllGenes
colnames(Annotation) = names(exampleSet)
print(Annotation[1:5,1:5])
      BIOCARTA_GRANULOCYTES_PATHWAY BIOCARTA_LYM_PATHWAY BIOCARTA_BLYMPHOCYTE_PATHWAY BIOCARTA_CARM_ER_PATHWAY BIOCARTA_LAIR_PATHWAY
CXCL8                             1                    1                            0                        0                     1
IFNG                              1                    0                            0                        0                     0
IL1A                              1                    1                            0                        0                     1
CSF3                              1                    0                            0                        0                     0
SELP                              1                    0                            0                        0                     1

Hope this is helpful! Let me know if you have further questions!