stephenslab / mashr

An R package for multivariate adaptive shrinkage.
https://stephenslab.github.io/mashr
Other
88 stars 19 forks source link

multiple cohorts from the same tissue #102

Open arkyl opened 3 years ago

arkyl commented 3 years ago

Hi, Thanks much for the software. I am wondering in case of multiple cohorts from the same tissue, what the best practice is. For example, 3 cohorts from tissue A, 2 cohorts from tissue B, and 1 cohort from tissue C: should I input 6 cohorts results to mash or should I do some combining work (such as meta analysis to get single result in each tissue) to input 3 tissue results to mash?

My other question is that the sample size may vary a lot from tissue A (e.g ~1000) to tissue C (e.g ~50). Would that be a problem for mash?

Thanks a lot for your advice!

Yue

gaow commented 3 years ago

@arkyl Good question -- are the cohorts of the same population (ideally if you could run their genotypes through eg PCA and tell from the PCs?)

arkyl commented 3 years ago

Thanks for the quick reply. The cohorts are all from european descent. So I guess they can be regarded as the same population. The initial results from the different cohorts within the same tissue are indeed very similar, which is expected.

gaow commented 3 years ago

Assuming there are no overlapping samples in these cohorts, if you perform fixed effect meta-analysis to merge the cohorts for each tissue, it would be the same as forcing the correlations between those cohorts to be 1 in a mash model. Not sure how others think of this (comments welcomed!), but I would probably perform meta-analysis first for each tissue to force it into using a reasonable model. The interpretation down the road might also be simpler , eg. you can make statements about sharing across tissues, not cohort+tissue combinations.

gaow commented 3 years ago

My other question is that the sample size may vary a lot from tissue A (e.g ~1000) to tissue C (e.g ~50). Would that be a problem for mash?

So your z-scores in tissue C are expected to be smaller than that in tissue A, but the effect size estimate may be of a similar scale -- standard error of smaller samples will be larger, thus smaller z-scores. This may relevant to choosing between EE and EZ model (alpha parameter in documentation for details) in mash. We generally suggest trying both and use the one model that results in a larger likelihood.

arkyl commented 3 years ago

Thanks a lot for detailed explanation and suggestions!