saigegit / SAIGE

Development for SAIGE and SAIGE-GENE(+)
GNU General Public License v3.0
66 stars 29 forks source link

Step1 and --useSparseGRMtoFitNULL and --LOCO #10

Closed tammy-dg closed 2 years ago

tammy-dg commented 2 years ago

Hello all,

For Step 1 (and subsequently Step 2 since that depends on the output model), is LOCO only available if the full GRM is used to fit the null model? I see in the source code that when --useSparseGRMtoFitNULL=TRUE, then LOCO is set to false: https://github.com/saigegit/SAIGE/blob/accec20e9eb49507e456c0412bdc0bfa65ce8874/R/backup/SAIGE_fitGLMM_fast.R#L714-L719 Also is --useSparseGRMforVarRatio set to false when --useSparseGRMtoFitNULL=TRUE because the variance is being calculated from the sparse GRM directly now?

Additionally, the --chrom flag doesn't seem to be working for Step 2. I have specified --chrom=1, passed in a VCF container all chromosomes, a group file containing containing genes on all chromosomes, e.g.:

A2ML1 var 12:8822652:A:G 12:8822654:G:A 12:8822661:C:T 12:8822680:T:A 12:8823222:TC:T 12:8823233:GGTTT:G 12:8823260:C:G 12:8823292:C:CCA
A2ML1 anno PTV_f01 PTV_f01 PTV_f01 PTV_f01 PTV_f01 PTV_f01 PTV_f01 PTV_f01
A3GALT2 var 1:33306784:G:GTACCCC 1:33306786:AC:A 1:33306827:GCCCGCGGGCCGATGTC:G 1:33306850:C:T 1:33306926:C:T 1:33306941:CCG:C 1:33306967:C:CCCCGCACAGTGCGCCGTCAG 1:33307009:GCC:G
A3GALT2 anno PTV_f01 PTV_f01 PTV_f01 PTV_f01 PTV_f01 PTV_f01 PTV_f01 PTV_f01

But in the logs I see that SAIGE-GENE+ looks to still be analyzing all genes. For example, though A2ML1 does not container markers on chr1, it looks to be analyzed:

[1] "Analyzing Region A2ML1 (4/18024)."
Start analyzing chunk 0.....
In chunks 0-0, 62 markers are ultra-rare and 8 markers are not ultra-rare.
Analyzing chunks (0/1, 0/1)........
Analyzing chunks (1/1, 0/1)........
Analyzing chunks (1/1, 1/1)........
[1] "Analyzing Region A3GALT2 (5/18024)."
Start analyzing chunk 0.....
In chunks 0-0, 17 markers are ultra-rare and 2 markers are not ultra-rare.
Analyzing chunks (0/1, 0/1)........
Analyzing chunks (1/1, 0/1)........
Analyzing chunks (1/1, 1/1)........

Thank you for your time and attention and development of the tool!

weizhou0 commented 2 years ago

Hi @tammy-dg,

  1. when using the sparse GRM for fitting the null model, the proximal contamination is not as severe as using the full GRM, so LOCO is set to be FALSE.

  2. Thanks for reporting this issue! We have now fixed it and --chrom should work in v1.0.2 More specifically,

if LOCO = FALSE, --chrom is

if LOCO = TRUE, --chrom is always required

tammy-dg commented 2 years ago

Hi @weizhou0 thank you for the quick response, I pulled the v1.0.2 from docker and --chrom looks like it is working as intended now! Thank you