xinhe-lab / mirage

Mixture model based Rare variant Analysis on Genes
https://xinhe-lab.github.io/mirage
1 stars 2 forks source link

Example of gene level FDR & multiple testing in genome-wide scan #2

Open gaow opened 4 years ago

gaow commented 4 years ago

@han16 I was asked by @linnanqia offline who has fixed bug in her code and got what seems encouraging results (log(BF) about 5 for some genes that seems to make sense). However in our tutorial we didn't explain how results are interpreted; in particular, how multiple testing is performed -- how gene level posterior probability should be interpreted in terms of FDR, and what threshold to use.

Could you kindly update the tutorial adding a section on interpreting the results? Thanks!

linnanqia commented 4 years ago

In your article, I found you just listed the top 10 high BF as the significant genes. not log10(BF). My result is excited. three genes' BF larger than 100. the largest one is 10560, the second one is 482. and All genes' post prob is almost 1. Hence, can you tell me how to choose the significant genes? Thank you! Best regards! Anna

han16 commented 4 years ago

In your article, I found you just listed the top 10 high BF as the significant genes. not log10(BF). My result is excited. three genes' BF larger than 100. the largest one is 10560, the second one is 482. and All genes' post prob is almost 1. Hence, can you tell me how to choose the significant genes? Thank you! Best regards! Anna

We used Bayesian FDR to select significant genes in the paper, see this reference paper https://www.tandfonline.com/doi/abs/10.1080/02664760600994745. In general, once BF and posterior probability are available, users could do FDR, whatever to choose the likely risk genes.

Shengtong.

han16 commented 4 years ago

@han16 I was asked by @linnanqia offline who has fixed bug in her code and got what seems encouraging results (log(BF) about 5 for some genes that seems to make sense). However in our tutorial we didn't explain how results are interpreted; in particular, how multiple testing is performed -- how gene level posterior probability should be interpreted in terms of FDR, and what threshold to use.

Could you kindly update the tutorial adding a section on interpreting the results? Thanks!

@gaow There is probably not a need to add FDR in the package as often time users don't do genome-wide screening. So applying FDR to a small gene set doesn't make much sense. As SKAT, they provide p values, and users could use criterions such as FDR to choose genes. In the mirage package, BF and posterior probability could be sufficient. I agree that adding further explanations on posterior probability, etc could be helpful and will update.

Shengtong.