xihaoli / STAARpipeline-Tutorial

The tutorial for performing single-/multi-trait association analysis of whole-genome/whole-exome sequencing (WGS/WES) studies using FAVORannotator, STAARpipeline and STAARpipelineSummary
GNU General Public License v3.0
21 stars 17 forks source link

No known loci for conditional analysis? #28

Closed Dani-kolbe closed 1 year ago

Dani-kolbe commented 1 year ago

Hi there! Thank you again for providing this very useful pipeline. I have a question regarding the later steps of analysis (Summarization and visualization). I am trying to analyse a WES dataset of around 5500 samples (~ 1200 case vs 4300 controls). I was wondering how I would proceed with the analysis if I lack a set of known loci for my binary trait? Is this an issue or is this part somewhat optional? Furthermore, what would the optimal path / pipeline be for this kind of analysis, which has the sole focus of coding variants? And do you perhaps have a recommendation as to how to proceed with a suitable pathway analysis after?

Sorry is this a lot, and thank you in advance!

Best wishes, Daniel

xihaoli commented 1 year ago

Hi Daniel,

Thanks for your questions. For analytical follow-ups in STAARpipeline, the known loci refer to a list of variants that are known to be associated with the trait of your interest, either reported in previous literature (e.g. GWAS catalog) or the individual (single-variant) analysis of the current study or both. The goal of adjusting for the known loci in the conditional analysis is to dissect rare variant associations independent of known variants. Given such, there is no issue if you don't provide known loci in STAARpipelineSummary, meaning there might not be any previously known genetic variants associated with the trait of interest. Thus, the conditional association strength is the same as the unconditional analysis.

Because STAARpipeline enables functionally-informed phenotype-genotype association analysis, one of the follow-up analyses is to query the functional annotations for all (rare) variants in the significant variant set and identify variant(s) that have a functional impact in driving the association of the set.

I hope this helps.

Best, Xihao