saigegit / SAIGE

Development for SAIGE and SAIGE-GENE(+)
GNU General Public License v3.0
69 stars 28 forks source link

Error in setVCFobjInCPP #43

Open psnehal opened 2 years ago

psnehal commented 2 years ago

We are trying to run the Saige 1.1.3 for Encore web app and we are getting this error when we run step 2. Step 1 is finishing successfully and we are getting .rda and variance file but step 2 gets stuck at below error:

'Setting position of samples in VCF files.... m_N 49699 Error in setVCFobjInCPP(vcfFile, vcfFileIndex, vcfField, t_SampleInModel = sampleInModel) : At least one subject requested is not in VCF file. Calls: SPAGMMATtest -> setGenoInput -> setVCFobjInCPP Execution halted'

We are using the .savvy format. The command is below:

Rscript /sw/pkgs/arc/encore/SAIGE/extdata/step2_SPAtests.R \ --savFile=/savs/chr1.sav \ --savFileIndex=/savs/chr1.sav.s1r \ --chrom=chr1 \ --sampleFile=/samples.txt \ --GMMATmodelFile=step1.again.rda \ --varianceRatioFile=step1.again.varianceRatio.txt \ --SAIGEOutputFile=folder/step2.again.bin.chr1.noStartOrEnd.txt

saigegit commented 2 years ago

Fixed in v1.1.5

Sandman2127 commented 1 year ago

Unfortunately, I'm getting this same error right now with your docker image: wzhou88/saige:1.2.0

I've confirmed every one of my samples is in both the phenotype files and the plink .fam file

docker run --cpus=$CPU --memory=$MEM_MB_REQ --volume $(pwd):/mnt/input/ --rm $SAIGE_DOCKER_CONTAINER /usr/bin/Rscript --max-ppsize=500000 /usr/local/bin/step1_fitNULLGLMM.R \
            --plinkFile=/mnt/input/${INPUT_PLINKF} \
            --useSparseGRMtoFitNULL=FALSE    \
            --phenoFile=/mnt/input/$INPUT_PHENOTYPE_FILE \
            --phenoCol=$phenotype_col \
            --sampleIDColinphenoFile=IID \
            --traitType=quantitative        \
            --outputPrefix=$DOCKER_OUT_DIR/QUANT_PHENOTYPE_GMMATMODEL \
            --nThreads=$CPU \
            --IsOverwriteVarianceRatioFile=TRUE \
            --useSparseGRMtoFitNULL=FALSE  

docker run --cpus=$CPU --memory=$MEM_MB_REQ --volume $(pwd):/mnt/input/ --rm $SAIGE_DOCKER_CONTAINER /usr/bin/Rscript --max-ppsize=500000 /usr/local/bin/step2_SPAtests.R  \
        --bedFile=/mnt/input/${INPUT_PLINKF}.bed \
        --bimFile=/mnt/input/${INPUT_PLINKF}.bim \
        --famFile=/mnt/input/${INPUT_PLINKF}.bim \
        --SAIGEOutputFile=$DOCKER_OUT_DIR/give_me_anything_output.txt \
        --minMAF=0 \
        --minMAC=20 \
        --GMMATmodelFile=$GMMATMODEL \
        --varianceRatioFile=$VARIANCERATIO  \
        --is_output_moreDetails=TRUE \
        --LOCO=FALSE \

I've tried variations of --LOCO=TRUE, --chrom=1 and any input form it allowed:

        # --bgenFile=/mnt/input/${INPUT_PLINKF}.bgen \
        # --bgenFileIndex=/mnt/input/${INPUT_PLINKF}.bgen.bgi \
        #--vcfFile=/mnt/input/${INPUT_PLINKF}.vcf.gz \
        #--vcfFileIndex=/mnt/input/${INPUT_PLINKF}.vcf.gz.csi \
        #--vcfField=GT \
        #--bedFile=/mnt/input/${INPUT_PLINKF}.bed \
        #--bimFile=/mnt/input/${INPUT_PLINKF}.bim \
        #--famFile=/mnt/input/${INPUT_PLINKF}.bim \

Trying with the test data using both docker image 1.16 & 1.20 and the extdata/input/:

plink files: nfam_100_nindep_0_step1_includeMoreRareVariants_poly_22chr_random1000.bed .fam .bim phenotype files: pheno_1000samples.txt & pheno_1000samples.txt_withdosages_withBothTraitTypes.txt

Same error:

Setting position of samples in PLINK files....
Error in setPLINKobjInCPP(bimFile, famFile, bedFile, sampleInModel, AlleleOrder) : 
  At least one subject requested is not in Plink file.
Calls: SPAGMMATtest -> setGenoInput -> setPLINKobjInCPP
Execution halted

Suggestions ?

giorkala commented 1 year ago

Hi Sandman2127, I had a similar problem and I bypased it using subSampleFile, see issue-79. Hope this helps, Yiorgos

Sandman2127 commented 1 year ago

Thanks @giorkala, I've managed to get this working by using the docker image 1.2.1 using step-0 Sparse GRM estimation to define the population used in steps 1 & 2, then it appears those algorithms are able to ignore any missed samples. I still find the above odd given that there are methods to begin saige @ step-1 which don't use the sparse-GRM (step-0). I'll take my win and move on, will consider this option later if I run into it again!

Best regards Dean