mashr for multiple gwas

arkyl commented 1 year ago

Hi, We are thinking about applying mashr for gwas studies. The gwas studies contain same samples of single phenotype at different time point, or multiple related phenotypes.

Because gwas contains millions of snps, our plan of steps are 1) take random set of snps (~200k) 2) take strong set of snps (~100k ; based on maybe simple meta-analysis) 3) account for sample overlap based on random set snps, as there are complete sample overlap between gwas. 4) build mashr model and calculate posterior summary for all snps (in millions).

I am wondering if the plan sounds reasonable, maybe too many data points or if it's possible to use a smaller set of random and strong (e.g 100k random and 50k strong). Our initial thought was not to prune or clump data so some snps may be totally correlated, i.e LD=1; I don't know if that would be a problem for mashr.

Thanks a lot for your thought on this and your suggestions!

Yue

pcarbo commented 1 year ago

@arkyl How many phenotypes do you expect to analyze, roughly?

arkyl commented 1 year ago

9 phenotypes.

arkyl commented 1 year ago

We also have ~15 visit time points for the gwas. We may also want to use mashr on them but that analysis is not as critical as on multiple phenotypes.

pcarbo commented 1 year ago

@arkyl That all sounds reasonable. There is no benefit to including SNPs that are perfectly correlated, so you can remove them.

arkyl commented 1 year ago

Thats a lot for your reply!

stephenslab / mashr

mashr for multiple gwas #112