weizhouUMICH / SAIGE

GNU Lesser General Public License v3.0
188 stars 73 forks source link

SAIGE-GENE STEP2 ERROR #202

Closed IoannaTach closed 2 years ago

IoannaTach commented 4 years ago

Hello,

I am running SAIGE-GENE version 0.39. Step0 and step1 run fine. I am running step 2 using:

Rscript /projects/cgr/users/kvrf354/software/SAIGE-master/extdata/step2_SPAtests.R \ --bgenFile=${model}_v2_2020_06_02/output/${model}_chrX_fixed.bgen \ --bgenFileIndex=${model}_v2_2020_06_02/output/${model}_chrX_fixed.bgen.bgi \ --minMAF=0 \ --minMAC=0.5 \ --maxMAFforGroupTest=0.5 \ --sampleFile=${model}_v2_2020_06_02/output/${model}_chrX_fixed_v2.sample \ --GMMATmodelFile=CKD.rda \ --varianceRatioFile=CKD_cate.varianceRatio.txt \ --SAIGEOutputFile=SAIGE.GENE.${model}.txt \ --numLinesOutput=1 \ --IsOutputAFinCaseCtrl=TRUE \ --IsOutputNinCaseCtrl=TRUE \ --IsOutputHetHomCountsinCaseCtrl=TRUE \ --IsOutputBETASEinBurdenTest=TRUE \ --groupFile=2020_0520${model}_groupfile.txt \ --sparseSigmaFile=cate.varianceRatio.txt_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseSigma.mtx \ --IsSingleVarinGroupTest=TRUE

The code runs fine for the first 3,494 genes and gives the following error for gene ARL13B:

Error in SKATExactBin.SKATO_GetQParam(Z, res, idx, 0, pi1, pr$prob_k, : Error in ResampleSTAT! Calls: SPAGMMATtest ... SKATExactBin.Moment -> SKATExactBin.SKATO_GetQParam

Output from the log file for this gene is:

geneID: ARL13B [1] "ARL13B\t3-93980441-C-T\t3-93980453-C-T\t3-93995883-T-G\t3-93995934-A-G\t3-94003855-G-A\t3-94003864-A-G\t3-94035391-A-G\t3-94035412-T-C\t3-94036572-G-A\t3-94036572-G-C\t3-94036575-G-A\t3-94036665-C-T\t3-94036737-A-C\t3-94039886-A-G\t3-94039943-A-G\t3-94043023-A-C\t3-94043053-A-G\t3-94043197-G-A\t3-94043215-G-A\t3-94043227-A-G\t3-94049425-T-C\t3-94049443-A-G\t3-94049500-G-A\t3-94050867-G-A\t3-94053233-T-C\t3-94053260-A-G" genetic variants with 4.796255e-06 <= MAF <= 0.5 are included for gene-based tests [1] "ids_to_include" [1] "3-93980441-C-T" "3-93980453-C-T" "3-93995883-T-G" "3-93995934-A-G" [5] "3-94003855-G-A" "3-94003864-A-G" "3-94035391-A-G" "3-94035412-T-C" [9] "3-94036572-G-A" "3-94036572-G-C" "3-94036575-G-A" "3-94036665-C-T" [13] "3-94036737-A-C" "3-94039886-A-G" "3-94039943-A-G" "3-94043023-A-C" [17] "3-94043053-A-G" "3-94043197-G-A" "3-94043215-G-A" "3-94043227-A-G" [21] "3-94049425-T-C" "3-94049443-A-G" "3-94049500-G-A" "3-94050867-G-A" [25] "3-94053233-T-C" "3-94053260-A-G" TEST 26 TEST 1 OK TEST2 26 ranges_to_include.nrow() 0 ranges_to_exclude.nrow() 0 ids_to_include.size() 26 ids_to_exclude.size() 0 26 markers will be analyzed Mtest: 26 AC is 104215 AC_new is 104247 AC is 104241 AC_new is 104245 AC is 104239 AC_new is 104245 AC is 104183 AC_new is 104247 AC is 104214 AC_new is 104246 AC is 104245 AC_new is 104247 AC is 104233 AC_new is 104247 AC is 103977 AC_new is 104247 AC is 104216 AC_new is 104246 AC is 104103 AC_new is 104247 AC is 104231 AC_new is 104245 closed the genofile! cntMarker: 26 indexforMissing: z is 1 dim(Gmat) is 52124 26 z is 2 dim(Gmat) is 52124 26 z is 3 dim(Gmat) is 52124 26 z is 4 dim(Gmat) is 52124 26 z is 5 dim(Gmat) is 52124 26 z is 6 dim(Gmat) is 52124 26 z is 7 dim(Gmat) is 52124 26 z is 8 dim(Gmat) is 52124 26 z is 9 dim(Gmat) is 52124 26 z is 10 dim(Gmat) is 52124 26 z is 11 dim(Gmat) is 52124 26 z is 12 dim(Gmat) is 52124 26 z is 13 dim(Gmat) is 52124 26 z is 14 dim(Gmat) is 52124 26 z is 15 dim(Gmat) is 52124 26 z is 16 dim(Gmat) is 52124 26 z is 17 dim(Gmat) is 52124 26 z is 18 dim(Gmat) is 52124 26 z is 19 dim(Gmat) is 52124 26 z is 20 dim(Gmat) is 52124 26 z is 21 dim(Gmat) is 52124 26 z is 22 dim(Gmat) is 52124 26 z is 23 dim(Gmat) is 52124 26 z is 24 dim(Gmat) is 52124 26 z is 25 dim(Gmat) is 52124 26 z is 26 dim(Gmat) is 52124 26 isCondition is FALSE Note the 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 th variants were flipped to use dosages for the minor alleles in gene-based tests weights: 24.96549 24.95974 24.99424 24.98274 24.98274 24.98274 24.98274 24.99424 24.99425 24.98849 24.98849 24.99425 24.98849 24.99425 24.99425 24.98849 24.99425 24.98849 24.99425 24.99424 24.99423 24.98849 24.99424 24.99425 24.99425 24.98274

For this gene, the cMAC in cases is 6.00001841 and in controls is 47.0000387136, so this should not be causing problems. One thing I just realised is that for step 1 I was using the default parameters for cateVarRatioMinMACVecExclude and cateVarRatioMaxMACVecInclude. I have now changed them slightly and rerunning step 1 as:

Rscript /projects/cgr/users/kvrf354/software/SAIGE-master/extdata/step1_fitNULLGLMM.R \ --plinkFile=2020_06_02_relatedness/output/relatedness \ --phenoFile=CVRM002_EUR_workpackage_covariates_round4_v2.txt \ --phenoCol=phenotype \ --covarColList=Covar1,Covar2,Covar3 \ --sampleIDColinphenoFile=FamilyID \ --traitType=binary \ --outputPrefix=CKD \ --outputPrefix_varRatio=CKD_cate \ --sparseGRMFile=sparseGRM_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx \ --sparseGRMSampleIDFile=sparseGRM_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx.sampleIDs.txt \ --cateVarRatioMinMACVecExclude=0.5,1.5,2.5,3.5,4.5,5.5,10.5,20.5 --cateVarRatioMaxMACVecInclude=0.5,1.5,2.5,3.5,4.5,5.5,10.5,20.5 --nThreads=8 \ --LOCO=FALSE \ --skipModelFitting=FALSE \ --IsSparseKin=TRUE \ --isCateVarianceRatio=TRUE

Wold this fix the issue in step 2 please? R=The error is caused by SKAT, is there a way to tell the software to only run the burden test? Thank you.

Best, Ioanna

IoannaTach commented 4 years ago

Actually, step1 as run above now gives me the error:

Categorical variance ratios will be estimated 0.5 < MAC <= 0.5 Error in scoreTest_SPAGMMAT_forVarianceRatio_binaryTrait(obj.glmm.null = modglmm, : ERROR! number of genetic variants in 0.5< MAC <= 0.5 is lower than 30 Please include more markers in this MAC category in the plink file Calls: fitNULLGLMM -> scoreTest_SPAGMMAT_forVarianceRatio_binaryTrait In addition: Warning messages: 1: glm.fit: fitted probabilities numerically 0 or 1 occurred 2: glm.fit: fitted probabilities numerically 0 or 1 occurred Execution halted

So I am rerunning it ith the default values for cateVarRatioMinMACVecExclude and cateVarRatioMaxMACVecInclude. So that was not the problem initially. Could you please advise?

weizhouUMICH commented 4 years ago

Hi @IoannaTach,

Sorry for my late reply and thank you for trying to investigate the problem! You are right, cateVarRatioMinMACVecExclude and cateVarRatioMaxMACVecInclude are not the problem . The error is from the SKAT function SKATExactBin.SKATO_GetQParam Unfortunately, current the option to run the Burden test only is not fully developed. I've noticed there is a warning message indicating there is "perfect separation" in the null model. Could you please try to remove the covariate(s) that may cause this problem? You may use the argument --minCovariateCount to do this.

Thanks, Wei

IoannaTach commented 4 years ago

Dear Wei,

Apologies for the late reply. It turns out one of the covariates (although continous) was creating the issue and once I removed it, SAIGE-GENE run. However, I have 2 issues looking at the results:

1) I have huge deflation of summary statistics - inflation factor lambda is 0.061. Have you seen such deflation before? It could be because the MAF threshold for variant inclusion is very low (0.1%). Also I calculate lambda making asymptotic assumptions of the test statistics under the null which maybe invalid for very low cumulative MAF within a gene, but perfutations are too computationally intensive. What are your thoughts please?

2) None of the genes on chrX run with SAIGE-GENE, as I get the message "ERROR: has minimum ploidy = 1 (not 2)". Do you have any advice on how to code chrX for the software to work?

Many thanks! Ioanna

weizhouUMICH commented 2 years ago

Dear Ioanna,

Sorry for my late reply! We have just released a new version 1.0.0. It has computational efficiency improvements for both Step 1 and Step 2 for single-variant and set-based tests. We have created a new program github page https://github.com/saigegit/SAIGE with the documentation provided https://saigegit.github.io/SAIGE-doc/ The program will be maintained by multiple SAIGE developers there. The docker image has been updated. Please feel free to try the version 1.0.0 and report issues if any.

Thanks! Wei