hail-is / hail

Cloud-native genomic dataframes and batch computing
https://hail.is
MIT License
982 stars 246 forks source link

Single variant and burden analyses and conditioning #363

Closed mzekavat closed 7 years ago

mzekavat commented 8 years ago

Hello, It would be really great to have the following tests in Hail, as detailed on EPACTS: http://genome.sph.umich.edu/wiki/EPACTS#Single_Variant_Test

Most important for immediate analyses (with ~10K individuals WGS) are q.emmax, emmaxCMC, and mmskat, which all use mixed models (with kinship matrices or GRMs).

Furthermore, one step beyond running analysis is doing conditional analysis. Right now, in EPACTS, doing conditional analysis requires adding separate columns in the .ped file corresponding to GTs for each variant you'd like to condition on. Ideally, we'd be able to just list the variants (maybe in tab-delimited format with Chr, Pos, Ref, Alt.

quantitative traits of interest: q.emmax mmskat

binary traits of interest: b.score (or b.wald) b.collapse emmaxCMC

Thanks again!

mzekavat commented 8 years ago

sample mmskat group test:

/humgen/gsa-hphome1/sek/akihiro/Software/EPACTS-3.2.6/bin/epacts group --groupf ${GROUP}.txt \ --vcf ${VCF_dir}/FinEst.WGS.QCed_Lipids.chr1.vcf.gz \ --ped ${Phen_dir}/EstoniaOnlyLipids.ped --max-maf 0.01 \ --kin ${Kin_dir}/FinEstKinship.kinf --sepchr --pheno ${PHENO} \ --cov AGE --cov FAST10 --cov SEX \ --cov LCSET.7048 --cov LCSET.7049 --cov LCSET.7123 --cov LCSET.7125 --cov LCSET.7130 --cov LCSET.7131 --cov LCSET.7132 --cov LCSET.7148 --cov LCSET.7162 --cov LCSET.7174 --cov LCSET.7175 --cov LCSET.7176 --cov LCSET.7257 --cov LCSET.7263 --cov LCSET.7308 --cov LCSET.7331 --cov LCSET.7334 --cov LCSET.7339 --cov LCSET.7340 --cov LCSET.7341 --cov LCSET.7353 --cov LCSET.7354 --cov LCSET.7355 --cov LCSET.7356 --cov LCSET.7357 --cov LCSET.7419 --cov LCSET.7420 --cov LCSET.7473 --cov LCSET.7494 \ --test mmskat --out ${OUT}.ESTonly --run 4

tpoterba commented 7 years ago

Maryam is doing this now with keytables.