weizhouUMICH / SAIGE

GNU Lesser General Public License v3.0
187 stars 72 forks source link

Error at Step2 of gene-based analysis: getDosage_bgen_withquery_Sparse() : BGenError #385

Closed JingxuanBao closed 2 years ago

JingxuanBao commented 2 years ago

Hi,

I was trying to run a gene-based analysis using SAIGE. However, I got an error as below. I wonder if you may help me take a look. Thanks!

R version 3.6.3 (2020-02-29) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: macOS 10.16

Matrix products: default BLAS/LAPACK: /opt/anaconda3/envs/saige4465/lib/libopenblasp-r0.3.18.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] SAIGE_0.44.6.4

loaded via a namespace (and not attached): [1] compiler_3.6.3 Matrix_1.3-3 Rcpp_1.0.6 grid_3.6.3
[5] RcppParallel_5.1.4 lattice_0.20-44
$vcfFile [1] ""

$vcfFileIndex [1] ""

$vcfField [1] "DS"

$bgenFile [1] "/Users/preprocessed_files/SNPs.qced.bit8.bgen"

$bgenFileIndex [1] "/Users/preprocessed_files/SNPs.qced.bgen.bgi"

$savFile [1] ""

$savFileIndex [1] ""

$idstoExcludeFile [1] ""

$idstoIncludeFile [1] ""

$rangestoExcludeFile [1] ""

$rangestoIncludeFile [1] ""

$chrom [1] "0"

$start [1] 1

$end [1] 2.5e+08

$IsDropMissingDosages [1] FALSE

$minMAF [1] 0

$minMAC [1] 0.5

$maxMAFforGroupTest [1] 0.01

$minInfo [1] 0

$sampleFile [1] "/Users/preprocessed_files/SNPs.qced.sample.modified"

$GMMATmodelFile [1] "/Users/7-out/step1SetAnalysis.rda"

$varianceRatioFile [1] "/Users/7-out/step1SetAnalysis.varianceRatio.txt"

$SAIGEOutputFile [1] "/Users/11-out/SNPs.set0.txt"

$numLinesOutput [1] 1

$IsSparse [1] TRUE

$SPAcutoff [1] 2

$IsOutputAFinCaseCtrl [1] FALSE

$IsOutputNinCaseCtrl [1] FALSE

$IsOutputHetHomCountsinCaseCtrl [1] FALSE

$LOCO [1] FALSE

$condition [1] ""

$sparseSigmaFile [1] "/Users/7-out/step1SetAnalysis.varianceRatio.txt_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseSigma.mtx"

$groupFile [1] "/Users/10-out/Set0.set"

$kernel [1] "linear.weighted"

$method [1] "optimal.adj"

$weights.beta.rare [1] "1,25"

$weights.beta.common [1] "1,25"

$weightMAFcutoff [1] 0.01

$r.corr [1] "0"

$IsSingleVarinGroupTest [1] TRUE

$IsOutputMAFinCaseCtrlinGroupTest [1] FALSE

$cateVarRatioMinMACVecExclude [1] "0.5,159.5,313.5,467.5,621.5,776.5,930.5,1084.5,1238.5"

$cateVarRatioMaxMACVecInclude [1] "159.5,313.5,467.5,621.5,776.5,930.5,1084.5,1238.5"

$dosageZerodCutoff [1] 0.2

$IsOutputPvalueNAinGroupTestforBinary [1] FALSE

$IsAccountforCasecontrolImbalanceinGroupTest [1] TRUE

$weightsIncludeinGroupFile [1] FALSE

$IsOutputBETASEinBurdenTest [1] FALSE

$IsOutputlogPforSingle [1] FALSE

$sampleFile_male [1] ""

$X_PARregion [1] ""

$is_rewrite_XnonPAR_forMales [1] FALSE

$method_to_CollapseUltraRare [1] "absence_or_presence"

$MACCutoff_to_CollapseUltraRare [1] 10

$DosageCutoff_for_UltraRarePresence [1] 0.5

$help [1] FALSE

weights.beta.rare is 1 25 weights.beta.common is 1 25 cateVarRatioMinMACVecExclude is 0.5 159.5 313.5 467.5 621.5 776.5 930.5 1084.5 1238.5 cateVarRatioMaxMACVecInclude is 159.5 313.5 467.5 621.5 776.5 930.5 1084.5 1238.5 group-based test will be performed Any dosages <= 0.2 for genetic variants with MAC <= 10 are set to be 0 in group tests Garbage collection 14 = 8+2+4 (level 2) ... 78.1 Mbytes of cons cells used (59%) 20.7 Mbytes of vectors used (32%) 1545 samples have been used to fit the glmm null model [1] "Leave-one-chromosome-out is not applied" variance Ratio is 1 1 1 1 1 1 1 1 1 1546 sample IDs are found in sample file isCondition is FALSE [1] 1546 4 [1] "IID" "IndexInModel" "IndexDose.x" "IndexDose.y" 1545 samples were used in fitting the NULL glmm model and are found in sample file sparse kinship matrix is going to be used sparseSigmaFile: /Users/bao96/Research/ADSP-GWAS/7-out/step1SetAnalysis.varianceRatio.txt_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseSigma.mtx Missing dosages will be mean imputed for the analysis Analysis started at 1639249201 Seconds minMAC: 0.5 minMAF: 0 Minimum MAF of markers to be tested is 0.0001618123 It is a binary trait Analyzing 1071 cases and 474 controls isCondition is FALSE Analysis started at 1639249201 Seconds It is a binary trait Case-control imbalance is adjusted for binary traits. Ultra rare variants with MAC <= 10 will be collpased for set-based tests in the 'absence or presence' way. For the resulted collpased marker, any individual having 0.5 <= dosage < 1.5 for any ultra rare variant has 1 in the genotype vector, having dosage >= 1.5 for any ultra rare variant has 2 in the genotype vector, otherwise 0. isCondition is FALSE geneID: 40 [1] "40\t1:758213:A:AT\trs201234755" genetic variants with 0.0001618123 <= MAF <= 0.01 are included for gene-based tests [1] "ids_to_include" [1] "1:758213:A:AT" "rs201234755"
TEST 2 TEST 1 OK TEST2 2 ranges_to_include.nrow() 0 ranges_to_exclude.nrow() 0 ids_to_include.size() 2 ids_to_exclude.size() 0 2 markers will be analyzed used (Mb) gc trigger (Mb) max used (Mb) Ncells 1484380 79.3 2464164 131.7 2464164 131.7 Vcells 2790983 21.3 8388608 64.0 4171875 31.9 used (Mb) gc trigger (Mb) max used (Mb) Ncells 1484365 79.3 2464164 131.7 2464164 131.7 Vcells 2782653 21.3 8388608 64.0 4171875 31.9 Mtest: 2 used (Mb) gc trigger (Mb) max used (Mb) Ncells 1484375 79.3 2464164 131.7 2464164 131.7 Vcells 2782653 21.3 8388608 64.0 4171875 31.9 Error in getDosage_bgen_withquery_Sparse() : BGenError Calls: SPAGMMATtest ... getGenoOfGene_bgen_Sparse -> getDosage_bgen_withquery_Sparse Execution halted

JingxuanBao commented 2 years ago

It turns out I didn't match the order of variants in the group file and the variants in the dosage file.