weizhouUMICH / SAIGE

GNU Lesser General Public License v3.0
187 stars 72 forks source link

Not equal number of samples in sample and bgen file #397

Closed Alireza-Majd closed 2 years ago

Alireza-Majd commented 2 years ago

Hello, I am thinking if there is an option to run SAIGE with bgen files that contain more samples than in sample file. I am trying to find a way not to edit bgen files every-time that we need to run analysis for a new phenotype, and a new inclusion/exclusion criteria should be applied for participants. Since any edits on bgen files (like conversion to 8 bit or filtering sample IDs) usually needs a huge time and computation. Is there any way to do that? Maybe like providing a text file containing samples id's to keep or include/exclude?

Thanks, Alireza

weizhouUMICH commented 2 years ago

Hi Alireza, Sorry fo the late reply!

You don't need to edit the bgen files themselves. Just need to have a sample file (no header) with one column for sample IDs in the bgen file. The bgen file contains the largest set of samples for the data set.

We have just released a new version 1.0.0. It has computational efficiency improvements for both Step 1 and Step 2 for single-variant and set-based tests. We have created a new program github page https://github.com/saigegit/SAIGE with the documentation provided https://saigegit.github.io/SAIGE-doc/ The program will be maintained by multiple SAIGE developers there. The docker image has been updated. I've previously successfully used the docker image for version 0.99.3 on both Google cloud and DNAnexus. Please feel free to try the version 1.0.0 and report issues if any.

Thanks! Wei