weizhouUMICH / SAIGE

GNU Lesser General Public License v3.0
188 stars 73 forks source link

which subset of variants for fitNULL MODEL #332

Closed psychrb closed 2 years ago

psychrb commented 3 years ago

II need to test/report only 2 specific variants of interest for SAIGE association in a cohort (8,000+ samples) of phenotyped/binary traits. QUESTION: Is there an optimal number of variants to use for fitNULLGLMM? How many variants overall would you suggest/ recommend for fitNULGLMM?? And which (i.e. random QC'ed subset)??..I have a genotyped cohort but need to report only small number of 2 specific variants for phenotype association. My question is how many variants to then include in the null model, if you've input or guidance. Thanks. Please advise if I may clarify the question.

weizhouUMICH commented 3 years ago

Hi @psychrb,

To fit the NULL model in Step 1, the markers included in the Plink file are used to estimate the genetic relationship matrix. You may find the Q4 in the FAQ part https://github.com/weizhouUMICH/SAIGE/wiki/Genetic-association-tests-using-SAIGE#frequently-asked-questions for the choice of the makers. In Step 2, you can only include the 2 markers in the dosage file which can be in the VCF or Bgen format.

Thanks, Wei

weizhouUMICH commented 2 years ago

Sorry fo the late reply! We have just released a new version 1.0.0. It has substantial computational efficiency improvements for both Step 1 and Step 2 for single-variant and set-based tests and clearer log output. We have created a new program github page https://github.com/saigegit/SAIGE with the documentation provided https://saigegit.github.io/SAIGE-doc/ The program will be maintained by multiple SAIGE developers there. The docker image has been updated. Please feel free to try the version 1.0.0 and report issues if any.

Thanks! Wei