weizhouUMICH / SAIGE

GNU Lesser General Public License v3.0
188 stars 73 forks source link

Cost estimation for each SAIGE run (each GWAS analysis) based on UKB #211

Closed Shicheng-Guo closed 2 years ago

Shicheng-Guo commented 4 years ago

Dear Dr. Zhou,

I am wondering how to estimate the cost for each GWAS in AWS with UKBB (500K array data or 50K WES data). Did you have any estimation before since you already have UKB-SAIGE-PheWAS.

Thanks.

Shicheng

weizhouUMICH commented 4 years ago

Hi Shicheng,

Table 1 in the SAIGE paper shows the computation cost for GWAS of single-variant association tests. https://www.nature.com/articles/s41588-018-0184-y/tables/1 On average, for 400k UKBB imputation (based on array) data, it cost ~600 CPU hrs on google cloud.

To conduct exome-wide gene-based tests, STable 1 in the SAIGE-GENE paper has the computation cost for ~400K sample. For Step 1, computation time is O(N^1.5) and for Step 2, computation time is O(N), where N is the sample size. https://static-content.springer.com/esm/art%3A10.1038%2Fs41588-020-0621-6/MediaObjects/41588_2020_621_MOESM1_ESM.pdf

Note that the the converge speed of the algorithm to fit the null mixed models (Step 1) varies by phenotypes. Step2 with bgen or sav input is faster then with VCF input.

Thanks, Wei