saigegit / SAIGE

Development for SAIGE and SAIGE-GENE(+)
GNU General Public License v3.0
64 stars 26 forks source link

Error: vector::_M_range_check: __n (which is 2538) >= this->size() (which is 2538) #139

Open sboer opened 4 months ago

sboer commented 4 months ago

I have been running four analyses for one phenotype; all samples, females only, males only and females vs. male cases, using SAIGE version 1.3.1.

For one analysis (females only) and two chromosomes (chr 12 and 14) I have run into this error in step2: Error: vector::_M_range_check: __n (which is 2538) >= this->size() (which is 2538) Execution halted

All other analyses seems to have been run correctly, and therefore I find it hard to figure out what is wrong with these two chromosomes for this particular phenotype only.

Full logfile chr 12: Loading required package: RhpcBLASctl R version 4.3.2 (2023-10-31) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Ubuntu 18.04.6 LTS

Matrix products: default BLAS/LAPACK: /mnt/work/miniconda3/envs/saige_1_3_1/lib/libopenblasp-r0.3.26.so; LAPACK version 3.12.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: Europe/Oslo tzcode source: system (glibc)

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] data.table_1.14.10 optparse_1.7.4 RhpcBLASctl_0.23-42 [4] SAIGE_1.3.1

loaded via a namespace (and not attached): [1] compiler_4.3.2 Matrix_1.6-5 Rcpp_1.0.12 getopt_1.20.4
[5] grid_4.3.2 RcppParallel_5.1.6 lattice_0.22-5
$vcfFile [1] "/mnt/archive/projects/HUNT_Allin/Genotypes/vcf_imputed_filtered/CHR12.HRC_WGS.vcf.gz"

$vcfFileIndex [1] "/mnt/archive/projects/HUNT_Allin/Genotypes/vcf_imputed_filtered/CHR12.HRC_WGS.vcf.gz.csi"

$vcfField [1] "DS"

$savFile [1] ""

$savFileIndex [1] ""

$bgenFile [1] ""

$bgenFileIndex [1] ""

$sampleFile [1] "/mnt/archive/projects/HUNT_Allin/Genotypes/vcf_imputed_filtered/samplelist_HUNT_Allin.txt"

$bedFile [1] ""

$bimFile [1] ""

$famFile [1] ""

$AlleleOrder [1] "alt-first"

$idstoIncludeFile [1] ""

$rangestoIncludeFile [1] ""

$chrom [1] "12"

$is_imputed_data [1] TRUE

$minMAF [1] 0

$minMAC [1] 3

$minGroupMAC_in_BurdenTest [1] 5

$minInfo [1] 0

$maxMissing [1] 0.15

$impute_method [1] "best_guess"

$LOCO [1] TRUE

$GMMATmodelFile [1] "./migraine_females/saige_outputfiles_step1/migraine_females.rda"

$varianceRatioFile [1] "./migraine_females/saige_outputfiles_step1/migraine_females.varianceRatio.txt"

$SAIGEOutputFile [1] "./migraine_females/migraine_females_results_chr12.txt"

$markers_per_chunk [1] 10000

$groups_per_chunk [1] 100

$is_output_moreDetails [1] TRUE

$is_overwrite_output [1] TRUE

$maxMAF_in_groupTest [1] "0.0001,0.001,0.01"

$maxMAC_in_groupTest [1] "0"

$annotation_in_groupTest [1] "lof,missense;lof,missense;lof;synonymous"

$groupFile [1] ""

$sparseGRMFile [1] ""

$sparseGRMSampleIDFile [1] ""

$relatednessCutoff [1] 0

$sampleFile_male [1] ""

$X_PARregion [1] ""

$is_rewrite_XnonPAR_forMales [1] FALSE

$MACCutoff_to_CollapseUltraRare [1] 10

$cateVarRatioMinMACVecExclude [1] "10,20.5"

$cateVarRatioMaxMACVecInclude [1] "20.5"

$weights.beta [1] "1,25"

$r.corr [1] 0

$markers_per_chunk_in_groupTest [1] 100

$condition [1] ""

$SPAcutoff [1] 2

$dosage_zerod_cutoff [1] 0.2

$dosage_zerod_MAC_cutoff [1] 10

$is_single_in_groupTest [1] FALSE

$is_no_weight_in_groupTest [1] FALSE

$is_output_markerList_in_groupTest [1] FALSE

$is_Firth_beta [1] TRUE

$pCutoffforFirth [1] 0.05

$is_fastTest [1] FALSE

$max_MAC_for_ER [1] 4

$subSampleFile [1] ""

$help [1] FALSE

[1] "opt$r.corr" [1] 0 dosage_zerod_cutoff 0.2 Any dosages <= 0.2 for genetic variants with MAC <= 10 are set to be 0. single-variant association test will be performed Leave chromosome 12 out will be applied chromosome 1 model results are removed to save memory chromosome 2 model results are removed to save memory chromosome 3 model results are removed to save memory chromosome 4 model results are removed to save memory chromosome 5 model results are removed to save memory chromosome 6 model results are removed to save memory chromosome 7 model results are removed to save memory chromosome 8 model results are removed to save memory chromosome 9 model results are removed to save memory chromosome 10 model results are removed to save memory chromosome 11 model results are removed to save memory chromosome 13 model results are removed to save memory chromosome 14 model results are removed to save memory chromosome 15 model results are removed to save memory chromosome 16 model results are removed to save memory chromosome 17 model results are removed to save memory chromosome 18 model results are removed to save memory chromosome 19 model results are removed to save memory chromosome 20 model results are removed to save memory chromosome 21 model results are removed to save memory chromosome 22 model results are removed to save memory P-values of genetic variants with MAC <= 4 will be calculated via effecient resampling. variance Ratio null is 0.9010369 Please note the argument vcfFileIndex will not be used in future versions because the vcf index file must has the name 'vcfFile'.csi dosageFile type is vcf Open VCF done To read the field DS Number of meta lines in the vcf file (lines starting with ##): 11 Number of samples in the vcf file: 69716 Setting position of samples in VCF files.... m_N 21228 (2024-04-08 10:57:41.053394) ---- Analyzing Chunk 1 : chrom InitialChunk ---- Error: vector::_M_range_check: __n (which is 2538) >= this->size() (which is 2538) Execution halted

Analysis script:
model="migraine_females" CHRout=12 GENOFILE="/mnt/archive/projects/HUNT_Allin/Genotypes/vcf_imputed_filtered/CHR12.HRC_WGS.vcf.gz" INDEX="csi" SAMPLEFILE="/mnt/archive/projects/HUNT_Allin/Genotypes/vcf_imputed_filtered/samplelist_HUNT_Allin.txt" LOCO=TRUE. MINMAC=3 MINMAF=0 MININFO=0 OutputMoreDetails="TRUE" FirthBeta="TRUE" pCutOffFirth=0.05 REWRITE_XnonPAR=FALSE

logfile_step2=./"$model"/"$model"_chr"$CHRout"_step2.log { timefile=./"$model"/"$model"_chr"$CHRout".time.txt /usr/bin/time -o ${timefile} -v step2_SPAtests.R \ --vcfFile="$GENOFILE" \ --vcfFileIndex="$GENOFILE"."$INDEX" \ --vcfField=DS \ --chrom="$CHRout" \ --minMAC="$MINMAC" \ --minMAF="$MINMAF" \ --minInfo="$MININFO" \ --sampleFile="$SAMPLEFILE" \ --is_imputed_data=TRUE \ --GMMATmodelFile=./"$model"/saige_outputfiles_step1/"$model".rda \ --varianceRatioFile=./"$model"/saige_outputfiles_step1/"$model".varianceRatio.txt \ --SAIGEOutputFile=./"$model"/"$model"_results_chr"$CHRout".txt \ --LOCO="$LOCO" \ --is_output_moreDetails="$OutputMoreDetails" \ --is_Firth_beta="$FirthBeta" \ --pCutoffforFirth="$pCutOffFirth" \ --is_rewrite_XnonPAR_forMales="$REWRITE_XnonPAR" } 2>&1 | tee "$logfile_step2"

Edit: I run step 2 with version 0.44.5 and it worked without any problems

Very thankful for any suggestions.

Sigrid

sboer commented 4 months ago

I got the answer from Laurent; mimimum MAC threshold was too low. Using --MINMAC 20 solved the problem. Leave the post in case others have the same issue.