weizhouUMICH / SAIGE

GNU Lesser General Public License v3.0
187 stars 72 forks source link

problem/error with running conditional analysis #403

Closed ruthchia closed 2 years ago

ruthchia commented 2 years ago

Hi Wei, I am trying to run a conditional analysis on selected variants, but am running into the following error:

Error in SPAGMMATtest(vcfFile = opt$vcfFile, vcfFileIndex = opt$vcfFileIndex,  :
  Conditioning markers are not found in the provided dosage file
Execution halted

I checked the input vcf file and can see that the snp is present and was able to run the analysis without the --condition on the same variant successfully. The SAIGE version used is SAIGE/0.44.6.1

Can you please help me see what the problem is?

thanks! Ruth

The output/error of the run is below:

step2_SPAtests.R \
>     --vcfFile=chr21.Rsq03.dose.vcf.gz \
>     --vcfFileIndex=chr21.Rsq03.dose.vcf.gz.csi \
>     --vcfField=DS \
>     --chrom=21 \
>     --minMAF=0.000001 \
>     --minMAC=3 \
>     --sampleFile=SampleList.hg38.keepRelated.forImputed.IID.txt \
>     --GMMATmodelFile=Case-allControls.noDups.keepRelated.withoutdbGAP_ADcontrols.hwe1e-6.rda \
>     --varianceRatioFile=Case-allControls.noDups.keepRelated.withoutdbGAP_ADcontrols.hwe1e-6.varianceRatio.txt \
>     --SAIGEOutputFile=Case-allControls.noDups.keepRelated_cond-variants.SAIGE.Rsq03.dose.vcf.txt \
>     --numLinesOutput=2 \
>     --IsOutputAFinCaseCtrl=TRUE \
>     --IsOutputNinCaseCtrl=TRUE \
>     --IsOutputHetHomCountsinCaseCtrl=TRUE \
>     --LOCO=FALSE \
>     --condition=chr21:31659715:C:T

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] SAIGE_0.44.6.1

loaded via a namespace (and not attached):
[1] compiler_3.5.1     Matrix_1.2-14      Rcpp_1.0.7         grid_3.5.1
[5] RcppParallel_5.0.3 lattice_0.20-35
$vcfFile
[1] "chr21.Rsq03.dose.vcf.gz"

$vcfFileIndex
[1] "chr21.Rsq03.dose.vcf.gz.csi"

$vcfField
[1] "DS"

$bgenFile
[1] ""

$bgenFileIndex
[1] ""

$savFile
[1] ""

$savFileIndex
[1] ""

$idstoExcludeFile
[1] ""

$idstoIncludeFile
[1] ""

$rangestoExcludeFile
[1] ""

$rangestoIncludeFile
[1] ""

$chrom
[1] "21"

$start
[1] 1

$end
[1] 2.5e+08

$IsDropMissingDosages
[1] FALSE

$minMAF
[1] 1e-06

$minMAC
[1] 3

$maxMAFforGroupTest
[1] 0.5

$minInfo
[1] 0

$sampleFile
[1] "SampleList.hg38.keepRelated.forImputed.IID.txt"

$GMMATmodelFile
[1] "Case-allControls.noDups.keepRelated.withoutdbGAP_ADcontrols.hwe1e-6.rda"

$varianceRatioFile
[1] "Case-allControls.noDups.keepRelated.withoutdbGAP_ADcontrols.hwe1e-6.varianceRatio.txt"

$SAIGEOutputFile
[1] "Case-allControls.noDups.keepRelated_cond-variants.SAIGE.Rsq03.dose.vcf.txt"

$numLinesOutput
[1] 2

$IsSparse
[1] TRUE

$SPAcutoff
[1] 2

$IsOutputAFinCaseCtrl
[1] TRUE

$IsOutputNinCaseCtrl
[1] TRUE

$IsOutputHetHomCountsinCaseCtrl
[1] TRUE

$LOCO
[1] FALSE

$condition
[1] "chr21:31659715:C:T"

$sparseSigmaFile
[1] ""

$groupFile
[1] ""

$kernel
[1] "linear.weighted"

$method
[1] "optimal.adj"

$weights.beta.rare
[1] "1,25"

$weights.beta.common
[1] "1,25"

$weightMAFcutoff
[1] 0.01

$r.corr
[1] "0"

$IsSingleVarinGroupTest
[1] FALSE

$IsOutputMAFinCaseCtrlinGroupTest
[1] FALSE

$cateVarRatioMinMACVecExclude
[1] "0.5,1.5,2.5,3.5,4.5,5.5,10.5,20.5"

$cateVarRatioMaxMACVecInclude
[1] "1.5,2.5,3.5,4.5,5.5,10.5,20.5"

$dosageZerodCutoff
[1] 0.2

$IsOutputPvalueNAinGroupTestforBinary
[1] FALSE

$IsAccountforCasecontrolImbalanceinGroupTest
[1] TRUE

$weightsIncludeinGroupFile
[1] FALSE

$IsOutputBETASEinBurdenTest
[1] FALSE

$sampleFile_male
[1] ""

$X_PARregion
[1] ""

$is_rewrite_XnonPAR_forMales
[1] FALSE

$method_to_CollapseUltraRare
[1] ""

$MACCutoff_to_CollapseUltraRare
[1] 10

$DosageCutoff_for_UltraRarePresence
[1] 0.5

$help
[1] FALSE

weights.beta.rare  is  1 25
weights.beta.common  is  1 25
cateVarRatioMinMACVecExclude  is  0.5 1.5 2.5 3.5 4.5 5.5 10.5 20.5
cateVarRatioMaxMACVecInclude  is  1.5 2.5 3.5 4.5 5.5 10.5 20.5
single-variant association test will be performed
Garbage collection 14 = 8+2+4 (level 2) ...
83.8 Mbytes of cons cells used (63%)
38.2 Mbytes of vectors used (49%)
48078  samples have been used to fit the glmm null model
[1] "Leave-one-chromosome-out is not applied"
Single variance ratio is provided, so categorical variance ratio won't be used!
variance Ratio is  0.9075571
59766  sample IDs are found in sample file
isCondition is  TRUE
Open VCF done
To read the field DS
Number of meta lines in the vcf file (lines starting with ##): 22
Number of samples in the vcf file: 59766
[1] 59766     4
[1] "IID"          "IndexInModel" "IndexDose.x"  "IndexDose.y"
48078  samples were used in fitting the NULL glmm model and are found in sample file
sparse kinship matrix is not used
Missing dosages will be mean imputed for the analysis
Analysis started at  1645897735 Seconds
minMAC:  3
minMAF:  1e-06
Minimum MAF of markers to be tested is  3.11993e-05
conditionlist is  condMarkers   chr21:31659715:C:T
std::size_t sample_size = marker_file.samples().size();59766
conditioning on
isCondition is  TRUE
Error in SPAGMMATtest(vcfFile = opt$vcfFile, vcfFileIndex = opt$vcfFileIndex,  :
  Conditioning markers are not found in the provided dosage file
Execution halted

and here's the first 10 lines of vcf dosage file:

zgrep -v "^##" chr21.Rsq03.dose.vcf.gz | cut -f1-10 | head | column -t
#CHROM  POS       ID                      REF  ALT    QUAL  FILTER  INFO                                       FORMAT     276-11-3_276-11-3
21      10015681  chr21:10015681:T:A      T    A      .     PASS    AF=2e-05;MAF=2e-05;IMPUTED;R2=0.375347     GT:DS:HDS  0|0:0:0,0
21      10034169  chr21:10034169:T:C      T    C      .     PASS    AF=2e-05;MAF=2e-05;IMPUTED;R2=0.349663     GT:DS:HDS  0|0:0:0,0
21      10055788  chr21:10055788:G:A      G    A      .     PASS    AF=4e-05;MAF=4e-05;IMPUTED;R2=0.480547     GT:DS:HDS  0|0:0:0,0
21      10069901  chr21:10069901:TG:T     TG   T      .     PASS    AF=4e-05;MAF=4e-05;IMPUTED;R2=0.4313       GT:DS:HDS  0|0:0:0,0
21      10078529  chr21:10078529:G:C      G    C      .     PASS    AF=0.0001;MAF=0.0001;IMPUTED;R2=0.443603   GT:DS:HDS  0|0:0:0,0
21      10082804  chr21:10082804:T:C      T    C      .     PASS    AF=2e-05;MAF=2e-05;IMPUTED;R2=0.522543     GT:DS:HDS  0|0:0:0,0
21      10084123  chr21:10084123:G:A      G    A      .     PASS    AF=0.00012;MAF=0.00012;IMPUTED;R2=0.34971  GT:DS:HDS  0|0:0:0,0
21      10098847  chr21:10098847:C:CTGGA  C    CTGGA  .     PASS    AF=5e-05;MAF=5e-05;IMPUTED;R2=0.39214      GT:DS:HDS  0|0:0:0,0
21      10122648  chr21:10122648:C:G      C    G      .     PASS    AF=4e-05;MAF=4e-05;IMPUTED;R2=0.39016      GT:DS:HDS  0|0:0:0,0
ruthchia commented 2 years ago

I figured it out!

FYI for others who may be interested.

so for the conditional parameter to work, in my case, the conditioned marker should have been: --condition 21:31659715_C/T.

I did try this earlier, but was still getting error because i had included the word 'chr' in front of the chromosome number.

weizhouUMICH commented 2 years ago

Hi @ruthchia,

Sorry fo the late reply! We have just released a new version 1.0.0. It has computational efficiency improvements for both Step 1 and Step 2 for single-variant and set-based tests. We have created a new program github page https://github.com/saigegit/SAIGE with the documentation provided https://saigegit.github.io/SAIGE-doc/ Note that we have changed to format for conditioning markers to CHR:POS:REF:ALT

Please feel free to try the version 1.0.0 and report issues there if any.

Thanks! Wei