saigegit / SAIGE

Development for SAIGE and SAIGE-GENE(+)
GNU General Public License v3.0
64 stars 27 forks source link

SAIGE step 1 using the whole genome separate chromosome specific to #114

Open zrayw opened 1 year ago

zrayw commented 1 year ago

Dear SAIGE developer,

I am currently conducting a binary traits GWAS on UKBB using SAIGE, and I have a couple of questions regarding the step 1 calculation of the full GRM matrix.

Full Genome Data for GRM Matrix: For the first step in SAIGE, is it necessary to compute the full GRM using the merged imputed data from chr1-chr22? Specifically, given the large size of the UKBB data, I found merging all chromosomes computationally demanding and slow. Do you have recommendations or best practices for using SAIGE with such extensive biobank datasets?

Chromosome-specific Variance Ratio: Instead of using the full genome data, would it be valid to use single chromosome data to compute GRM in step 1 and then apply the variance ratio to step 2 for SPA tests?

Thank you very much for your assistance!

Lloyd-LiuSiyi commented 10 months ago

HI @zrayw , have you got any new experience in dealing with UKBB data? I'm also trying to perform set-based tests with WES data. So far as I know a sparse GRM would be useful in all situations, but I'm curious whether in step2 I should estimate the variance ratio in smaller parts such as LD blocks since it might be computationally faster?

zrayw commented 10 months ago

Hi, I pruned independent markers according to LD in step 1, it's a lot faster. I noticed that on the SAIGE original paper. Hope it helps.

Lloyd-LiuSiyi commented 10 months ago

That's worth noticing. Thanks for the suggestion @zrayw!