xihaoli / STAARpipeline-Tutorial

The tutorial for performing single-/multi-trait association analysis of whole-genome/whole-exome sequencing (WGS/WES) studies using FAVORannotator, STAARpipeline and STAARpipelineSummary
GNU General Public License v3.0
21 stars 17 forks source link

gnomAD AF & family-based vcf analysis #34

Open XiaKwan opened 9 months ago

XiaKwan commented 9 months ago

Hi Xihao,

Thanks for your useful tool! And I've met some problems really confused me:

  1. I noticed that allele frequencies of gnomAD and 1000G are contained in FAVOR full database but not in FAVOR essential database. Are there any approaches that I could make use of these AF annotations in the STAAR procedure? (Like the variants with gnomAD AF < 5‰ will be given priority).

  2. I'm now working on a rare disease and my cohort contains ~700 trio families (only the child has the disease, parents are healthy), so my data is a vcf file with ~2000 samples. Does STAAR pipeline support family-based analysis? If so, how can I represent this family relationship while analyzing? (maybe in the pheno.csv?)

  3. Do I need to add more irrelevant healthy samples as control?

Thanks a lot!!!

xihaoli commented 7 months ago

Hi @XiaKwan,

Thanks for your patience. Regarding your questions,

1) Yes you can, and in this case, you may need to a) annotate your genotype data using the FAVOR full database through FAVORannotator, where you may want to update the scripts in specific steps (see this thread for more details); b) perform some transformations on the AF to Phred-scale, to be used as weights; and c) incorporate the AF annotations in the STAAR procedure.

2) We are currently developing methods for analyzing family data (trio design) using STAAR/STAARpipeline. We will let you know when they are ready to use.

3) Using common controls, rather than sequencing new controls for every study, can boost power to detect genotype–phenotype associations by increasing the sample size or providing a control set where none existed (see this review paper for more details), but keep in mind that if your case/control is imbalanced due to the inclusion of healthy samples as controls, a saddlepoint approximation may be needed to calibrate the association analysis p-values, which is enabled in the version 0.9.7 of STAAR and STAARpipeline packages.

Hope this helps.

Best, Xihao

bmuchmore commented 5 months ago

Just curious, any update on trio-specific analysis using STAARpipeline? If you need an early tester using outside data, I would be happy to try it out.

-Brian

xihaoli commented 4 months ago

Hi @bmuchmore,

Thank you very much for your interest! We are still developing methods for analyzing family data (trio design) using STAAR/STAARpipeline. We will let you know as soon as they are ready to be tested.

Best, Xihao