weizhouUMICH / SAIGE

GNU Lesser General Public License v3.0
187 stars 72 forks source link

Single marker analysis for all the variants including MAC<3 in the whole exome sequencing data #386

Closed iamhere218 closed 2 years ago

iamhere218 commented 2 years ago

Hi Wei,

I am interested in single marker analysis using SAIGE and/or SAIGE-GENE for WES (Whole Exome Sequencing) data.

Could you please tell me what I need to do to obtain full results for all the "single" marker analysis using SAIGE and/or SAIGE-GENE?

In SAIGE, it was recommended to use SAIGE-GENE for the variants with MAC < 3 as below.

Please note that the saddlepoint approximation used in SAIGE for single-variant association tests does not work for variants with MAC <= 3 (usually gives extremely significant P-values). For those extremely rare variants, please use SAIGE-GENE to tests them.

Then does it mean that I should combine the results from SAIGE for variants with MAC >=3 and the results from SAIGE-GENE for variants with MAC < 3 for FULL single marker analysis?, as it is expected to have different single marker results for MAC >=3 between SAIGE and SAIGE-GENE because they are using different way to estimate the variance ratio (based on full vs sparse GRM)?

Also which options should I use for SAIGE-GENE to obtain single marker analysis? I compared the results between SAIGE-GENE (using sparseGRM) with/ without [ --groupFile and --IsSingleVarinGroupTest=TRUE ] options, but most of p.values were not matched for the variants including 1 <= MAC. If it is expected due to any seed number or random marker selection issues, could you please advise which option I should use to get right results for single marker analysis?

Thank you!

weizhouUMICH commented 2 years ago

Hi @iamhere218,

For single-variant tests in SAIGE, SPA was applied to the score tests to obtain accurate p-values for binary traits with unbalanced case-control ratios. SPA does not work for variants with MAC <=3 and usually output p-value 0 or extremely small p-values. If you would like to test those markers, SAIGE-GENE needs to be used. In SAIGE-GENE, instead of SPA, a method called efficient resampling is used to obtain single-variant p-values for any variants with MAC<=10. But the power is very low for testing those ultra-rare variants in the single-variant assoc tests. More recently, we update SAIGE-GENE to SAIGE-GENE+ (since v0.44.6.1), in which all ultra-rare markers with MAC <= 10 are collapsed first and then tested with all other variants as a group. (--method_to_CollapseUltraRare="absence_or_presence"). It sound unusual using --groupFile and --IsSingleVarinGroupTest=TRUE gives very different p-values for markers. One difference is that in the SAIGE-GENE, categorical variance ratios are used while in SAIGE, a single variance ratio is used. But how different are those p-values?

weizhouUMICH commented 2 years ago

Hi @iamhere218

We have just released a new version 1.0.0. It has computational efficiency improvements for both Step 1 and Step 2 for single-variant and set-based tests. We have created a new program github page https://github.com/saigegit/SAIGE with the documentation provided https://saigegit.github.io/SAIGE-doc/

The program will be maintained by multiple SAIGE developers there. Please feel free to try the version 1.0.0 and report issues if any.

Thanks! Wei