Closed tinggithub33 closed 3 years ago
I have a question about pvalue_louis as below. When p value =2.12E-28, pvalue_louis = 7.20E-26 (second row from bottom). this make sense to me. But when pvalue is 0, pvalue_louis is 0.99999XXX. it seems weird to me. do you think it is right? any comments will be greatly appreciated. Best reargds, Tingfen annot_id | pvalue_louis | pvalue |
---|---|---|
KEGG_CITRATE_CYCLE_TCA_CYCLE | 0.99999838 | 0 |
GO_ER_TO_GOLGI_TRANSPORT_VESICLE_MEMBRANE | 0.99999838 | 0 |
GO_NEGATIVE_REGULATION_OF_MUSCLE_CELL_DIFFERENTIATION | 0.99999839 | 0 |
MARIADASON_REGULATED_BY_HISTONE_ACETYLATION_DN | 0.9999984 | 0 |
NIKOLSKY_BREAST_CANCER_12Q13_Q21_AMPLICON | 0.9999984 | 0 |
FONTAINE_THYROID_TUMOR_UNCERTAIN_MALIGNANCY_UP | 0.9999984 | 0 |
GO_SMALL_SUBUNIT_PROCESSOME | 0.99999841 | 0 |
DUTTA_APOPTOSIS_VIA_NFKB | 0.99999842 | 0 |
GO_RESPONSE_TO_ZINC_ION | 7.20E-26 | 2.12E-28 |
GO_CELLULAR_RESPONSE_TO_INORGANIC_SUBSTANCE | 9.86E-21 | 2.63E-23 |
below is the script I ran. and I found a message from standard output as:
[1] "try-error!" . . what does "try-error!" tell? Thank you very much for any comments! Tingfen
#######################scripts###############
d <- read.delim(sprintf("%s/%s",indir,input_file))
d$lfcSE=abs(d$coef/d$z) rownames(d) <- d$primerid summary_data=na.omit(d[,c("coef","lfcSE")]) colnames(summary_data) <- c("logFoldChange","lfcSE")
data(humanGeneSets) annotation_data=humanGeneSets idea <- CreateiDEAObject(summary_data, annotation_data, max_var_beta = 100, min_precent_annot = 0.0025, num_core=10)
idea <- iDEA.fit(idea, fit_noGS=FALSE, init_beta=NULL, init_tau=c(-2,0.5), min_degene=5, em_iter=15, mcmc_iter=1000, fit.tol=1e-5, modelVariant = F, verbose=TRUE)
idea <- iDEA.louis(idea) gsea <- idea@gsea
write.table(gsea, sprintf("%s/iDEA/%s/gsea_%s_z.txt",indir,cellType,out_file),row.names = F, quote=F, sep="\t")
Hi, I also encountered the same problem. ################# idea.null <- iDEA.louis(idea.null) [1] "try-error!" [1] "try-error!" [1] "try-error!" [1] "try-error!" [1] "try-error!"
Hi, I've been having a similar issue implementing and understanding pvalue_louis. Running DE of 10,000 genes (logFC from -4 to 4) from edgeR and using variance (back-calculated from edgeR p-value -> zscore), I get the following plot with little significance. Very excited to figure this out and implement iDEA for our studies!
Hi, @tinggithub33 and @ZHIDIHUAYUAN,
The inconsistency between pvalue_louis and pvalue might because of the estimation of the annotation coefficient estimation and the variance estimation of the coefficient. I suspect that the annot_coef estimate for the gene sets with the pvalue_louis near 1 are quite small, i.e. < -10. This extremely small estimate will lead to the probability of the gene in this gene set being the DE gene quite small. Then the variance estimates in the EM algorithms will be super large. When we use Louis method to correct for the information matrix which was need to estimate the variance of the gene set enrichment coefficient, with those small extremely parameters, the variance estimates will usually to be very large. I suspect the number of genes from your summary statistics as well as in these questionable gene sets are very small, which might cause this issue.
For the "try error" message, it's just the reminder that the variance estimates by Louis method are negative, this might also due to the extremely small gene set size. These should not be a big issue and I expect this should be happening among only a few gene sets with extremely small gene set size.
Hi @mcclo,
pvalue_louis here is calculated when we use Louis Method to correct for the information matrix calculated in the EM algorithm. Based on your plot, what does the distribution of your input variance and input p-value of the genes look like? Perhaps more details would be helpful to understand the little significance here.
Hi @YingMa1993 ,
Below are histograms of the input variance and p-value distributions. Thanks in advance for taking the time to help out!
To calculate beta_var from pvalue: zscore <- qnorm(pvalue/2.0, lower.tail=FALSE) beta <- markers$log2FC beta_var <- beta/zscore
Hi @mcclo,
Thanks for providing those details! I think here the beta_var should be beta_var <- (beta/zscore)^2 since it is the variance so it should be the square of the standard error when SE = abs(beta / zscore). I've add more details about how to calculate the SE and beta_var in our tutorial.