About GLMM p-value peaks passing 0.01

wososa commented 3 years ago

Hi @a-slide @tleonardi ,

Using the same direct RNA-seq data, would nanocompore generate different GMM p-values due to total number of genes/transcripts included in the analysis? For example, is the GMM p-value only affected by aligned reads for a given RNA transcript? Would the GMM p-value from other genes on the genome neutralize (or adjust) global p-value distribution?

Thanks, Woody

tleonardi commented 3 years ago

Hi @wososa, changing the list of transcripts analyzed does not affect the GMM fitting itself, but it does have an effect on the p-values due to multiple hypothesis correction. After calculating a p-value for each kmer of each analysed transcript, Nanocompore uses the Benjamini-Hochberg method to correct these p-values for multiple hypothesis testing. With a higher the number of transcripts (and as a consequence of tests performed) the p-values will be bigger (less significant). This is essentially the same thing that happens when you do differential expression testing of RNA-Seq data with tools like DESeq: if you increase the number of genes tested the p-values will be more penalized by the BH correction.

wososa commented 3 years ago

Hi @tleonardi , Thanks for your explanations! It would be nice to indicate "adj.p" in the output table, so people can understand that the p-values are not nominal p-values. Best, Woody

tleonardi / nanocompore

About GLMM p-value peaks passing 0.01 #187