weizhouUMICH / SAIGE

GNU Lesser General Public License v3.0
188 stars 73 forks source link

Single variant association error in "step 2" #319

Closed bw-dpm closed 2 years ago

bw-dpm commented 3 years ago

Hi we have been trying to use SAIGE for single variant associations on a binary trait. The fit of a null logistic mixed model ("step 1") runs successfully but the single variant association test ("step 2") keeps failing with the following error:

Error in if (markerInfo >= 0 & markerInfo <= 1) { : argument is of length zero Calls: SPAGMMATtest Execution halted

(See attached "step2_output.txt" for the complete output generated in step 2).

The error seems to originate in the SPAGMMATtest function definition @ line 771 of SAIGE_SPATest.R: Gx = getGenoOfnthVar_vcfDosage(mth). The Gx object returned by this call is NULL.

Attached files:

Additional information:

Thank you very much for your help! Best Franco Giulianini

step1_step2_options.txt step2_output.txt vcf_file.txt

weizhouUMICH commented 3 years ago

Hi Franco,

Thanks so much for providing the files! The issues may be due to the mismatch of the chrom string. Can you specify --chr = "chr21" instead of "21". Thanks!

Wei

Indra1714 commented 3 years ago

@weizhouUMICH Hi I am getting the following error. Please help me out in solving the error.

Warning: missing FMT field (GT) Error in if (markerInfo >= 0 & markerInfo <= 1) { : argument is of length zero Calls: SPAGMMATtest Thanks Indra

weizhouUMICH commented 3 years ago

Hi Indra @kavitashah,

Is the error seen at the begining of the analysis? It seems the VCF is not read appropriately. Have you checked the --chr string?

Thanks, Wei

jenny315 commented 2 years ago

@weizhouUMICH Hello, I get the same problem:

...
[1] "Leave-one-chromosome-out is not applied"
Single variance ratio is provided, so categorical variance ratio won't be used!
variance Ratio is  1 
52  sample IDs are found in sample file
isCondition is  FALSE 
Setting genomic region 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50:1:250000000
WARNING: Open VCF failed
[1] 52  4
[1] "IID"          "IndexInModel" "IndexDose.x"  "IndexDose.y" 
52  samples were used in fitting the NULL glmm model and are found in sample file
sparse kinship matrix is not used
Missing dosages will be mean imputed for the analysis
Analysis started at  1634803831 Seconds
minMAC:  1 
minMAF:  1e-04 
Minimum MAF of markers to be tested is  0.009615385 
It is a binary trait
Analyzing  12  cases and  40  controls 
isCondition is  FALSE 
Analysis started at  1634803831 Seconds
Warning: missing FMT field (GT)
Error in if (markerInfo >= 0 & markerInfo <= 1) { : 
  argument is of length zero
Calls: SPAGMMATtest
Execution halted

My command line is as follows:

#Perform single-variant association tests
#for binary traits
Rscript /data/gpfs02/fpu/xiaojz/software/SAIGE/extdata/step2_SPAtests.R \
        --vcfFile=../filter.pass.SNP.addSNPid.changeCHRid.maf005geno005mind01_sort.vcf.gz \
        --vcfFileIndex=../filter.pass.SNP.addSNPid.changeCHRid.maf005geno005mind01_sort.vcf.gz.tbi \
        --vcfField=GT \
        --chrom=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50 \
        --minMAF=0.0001 \
        --minMAC=1 \
        --sampleFile=../saige_sample_idd.txt \
        --GMMATmodelFile=./step1_output/filter.pass.SNP.addSNPid.changeCHRid.maf005geno005mind01_sort.bed_binary.rda \
        --varianceRatioFile=./step1_output/filter.pass.SNP.addSNPid.changeCHRid.maf005geno005mind01_sort.bed_binary.varianceRatio.txt \
        --SAIGEOutputFile=./step2_output/example_binary.SAIGE.vcf.genotype.txt \
        --numLinesOutput=2 \
        --LOCO=FALSE \
        --IsOutputAFinCaseCtrl=TRUE \
        --IsOutputNinCaseCtrl=TRUE \
        --IsOutputHetHomCountsinCaseCtrl=TRUE

and part of input file like this

##fileformat=VCFv4.0
##Tassel=<ID=GenotypeTable,Version=5,Description="Reference allele is not kn
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the re
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth (only filtered
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">
##FORMAT=<ID=PL,Number=.,Type=Float,Description="Normalized, Phred-scaled li
##Tassel=<ID=GenotypeTable,Version=5,Description="Reference allele is not kn
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the re
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth (only filtered reads used for calling)">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">
##FORMAT=<ID=PL,Number=.,Type=Float,Description="Normalized, Phred-scaled likelihoods for AA,AB,BB genotypes where A=ref 
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  101 
1       62      SNP1000000002   T       A       .       PASS    AC=37;AN=104;PR;DP=0    GT:AD:DP:GQ:PL  0/0:0,0:0:33:0,0,
1       71      SNP1000000003   T       C       .       PASS    AC=31;AN=102;PR;DP=0    GT:AD:DP:GQ:PL  0/0:0,0:0:33:0,0,

example file in this software ./input/genotype_10markers.vcf.gz like this:

##fileformat=VCFv4.2
##fileDate=20171104
##source=PLINKv1.90
##contig=<ID=1,length=100001>
##INFO=<ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##fileDate=20171104
##source=PLINKv1.90
##contig=<ID=1,length=100001>
##INFO=<ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  1a1   
1       4       rs4     A       C       .       .       PR      GT      0/0     
1       9       rs9     A       C       .       .       PR      GT      0/0

Thanks Jenny

michelemarongiu79 commented 2 years ago

Hi Jenny, I have the same error. Were you able to solve?

weizhouUMICH commented 2 years ago

Sorry fo the late reply! We have just released a new version 1.0.0. It has substantial computational efficiency improvements for both Step 1 and Step 2 for single-variant and set-based tests and clearer log output. We have created a new program github page https://github.com/saigegit/SAIGE with the documentation provided https://saigegit.github.io/SAIGE-doc/ The program will be maintained by multiple SAIGE developers there. The docker image has been updated. Please feel free to try the version 1.0.0 and report issues if any.

Thanks! Wei