weizhouUMICH / SAIGE

GNU Lesser General Public License v3.0
188 stars 73 forks source link

caught segfault #342

Closed aliamin222 closed 2 years ago

aliamin222 commented 3 years ago

Hello, I am using UK biobank 500k imputed data for a gene-based analysis. for traits with sample size > 340k (quantitative traits) or 230k (binary traits) I get below error:

caught segfault address 0x2aadb5269de4, cause 'memory not mapped' . . An irrecoverable exception occurred. R is aborting now ... /hpc/apps/current/saige/v0.38.app/step1_fitNULLGLMM.R: line 27: 12298 Segmentation fault

I am using Saige version 0.38. When I manually decrease the sample size (less than 340k or 230k) , I do not get this error. can you let me know why I get this error for large sample sizes please? below is my bash script. Thank you Ali

!/bin/bash

SBATCH --nodes=1

SBATCH --ntasks=1

SBATCH --cpus-per-task=16

SBATCH --mem=2900G

SBATCH -e /hpc/scratch/logs/S1_imp_quant-%j.err

SBATCH -o /hpc/scratch/logs/S1_imp_quant-%j.out

/bin/echo Running on: hostname. Job name ${SLURM_JOB_NAME}. Job ID ${SLURM_JOB_ID}. Starting: date i=${SLURM_ARRAY_TASK_ID} Trait=$(head --lines=$i ${TraitFile} | tail -1 | awk '{print $1}') echo ${Trait}

impdata=/hpc/scratch/imp_data resultspath=/hpc/scratch/imp_res programpath=/hpc/scratch/SAIGE/extdata mkdir ${resultspath}/${Trait}

/bin/echo Running on: hostname /bin/echo Job name: ${SLURM_JOB_NAME} /bin/echo Job ID: ${SLURM_JOB_ID} /bin/echo Task ID: ${SLURM_ARRAY_TASK_ID}

echo "" echo "Starting Step 1"

echo "Pheno: ${trait}"

echo "Pheno type: ${trait_type}"

echo date echo ""

Rscript ${programpath}/step1_fitNULLGLMM.R \ --plinkFile=${impdata}/UKBB_chr13 \ --phenoFile=${impdata}/phenotype.txt \ --phenoCol=${Trait} \ --covarColList=ge_pc1,ge_pc2,ge_pc3,ge_pc4,ge_pc5,ge_pc6,ge_pc7,ge_pc8,ge_pc9,ge_pc10 \ --sampleIDColinphenoFile=IID \ --traitType=quantitative \ --invNormalize=FALSE \ --outputPrefix=${resultspath}/${Trait}/${Trait} \ --outputPrefix_varRatio=${resultspath}/${Trait}/${Trait}_cat \ --sparseGRMFile=${resultspath}/sparseGRM_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx \ --sparseGRMSampleIDFile=${resultspath}/sparseGRM_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx.sampleIDs.txt \ --nThreads=8 \ --LOCO=FALSE \ --skipModelFitting=FALSE \ --IsSparseKin=TRUE \ --isCateVarianceRatio=TRUE \ --IsOverwriteVarianceRatioFile=TRUE

/bin/echo Finished at: date

weizhouUMICH commented 3 years ago

Hi @aliamin222,

Could you please share more log details before the error? BTW, I've noticed that you specified 2300G, which is much higher than needed.
The memory usage for the Step 1 can be estimated as MN/(3210^9) Gb + memory for the phenotype file. M is the number of markers and N is the number of samples in the plink file Fo example, for N=400000 samples, M - 93000, the memory usage will be slightly higher than 9Gb.

Thanks, Wei

aliamin222 commented 3 years ago

Hi Wei, Thank you for quick response. I can attached the log files but I do not think it is possible in this platform. I copied the log file below (sorry, it has 423 lines). I used the max memory available to make sure the error is not related to machine memory. Thank you, Ali

Running on: cpu-415. Job name S1Qimp. Job ID 41261847. Starting: Mon May 10 10:28:59 EDT 2021 f_102_0_f_QUANT_pulse_minimumValue_residualizedRelated Running on: cpu-415 Job name: S1Qimp Job ID: 41261847 Task ID: 1

Starting Step 1 Mon May 10 10:28:59 EDT 2021

R version 3.5.1 (2018-07-02) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.6 LTS

Matrix products: default BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale: [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] optparse_1.6.4 SAIGE_0.38

loaded via a namespace (and not attached): [1] compiler_3.5.1 Matrix_1.2-14 Rcpp_1.0.4.6 getopt_1.20.3 [5] grid_3.5.1 RcppParallel_5.0.0 lattice_0.20-35 $plinkFile [1] "/hpc/scratch/imp_data/UKBB_chr13"

$phenoFile [1] "/hpc/scratch/imp_data/phenotype.txt"

$phenoCol [1] "f_102_0_f_QUANT_pulse_minimumValue_residualizedRelated"

$traitType [1] "quantitative"

$invNormalize [1] FALSE

$covarColList [1] "ge_pc1,ge_pc2,ge_pc3,ge_pc4,ge_pc5,ge_pc6,ge_pc7,ge_pc8,ge_pc9,ge_pc10"

$sampleIDColinphenoFile [1] "IID"

$tol [1] 0.02

$maxiter [1] 20

$tolPCG [1] 1e-05

$maxiterPCG [1] 500

$nThreads [1] 8

$SPAcutoff [1] 2

$numRandomMarkerforVarianceRatio [1] 30

$skipModelFitting [1] FALSE

$memoryChunk [1] 2

$tauInit [1] "0,0"

$LOCO [1] FALSE

$traceCVcutoff [1] 0.0025

$ratioCVcutoff [1] 0.001

$outputPrefix [1] "/hpc/scratch/imp_res/f_102_0_f_QUANT_pulse_minimumValue_residualizedRelated/f_102_0_f_QUANT_pulse_minimumValue_residualizedRelated"

$outputPrefix_varRatio [1] "/hpc/scratch/imp_res/f_102_0_f_QUANT_pulse_minimumValue_residualizedRelated/f_102_0_f_QUANT_pulse_minimumValue_residualizedRelated_cate"

$IsOverwriteVarianceRatioFile [1] TRUE

$IsSparseKin [1] TRUE

$sparseGRMFile [1] "/hpc/scratch/imp_res/sparseGRM_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx"

$sparseGRMSampleIDFile [1] "/hpc/scratch/imp_res/sparseGRM_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx.sampleIDs.txt"

$numRandomMarkerforSparseKin [1] 2000

$isCateVarianceRatio [1] TRUE

$relatednessCutoff [1] 0.125

$cateVarRatioMinMACVecExclude [1] "0.5,1.5,2.5,3.5,4.5,5.5,10.5,20.5"

$cateVarRatioMaxMACVecInclude [1] "1.5,2.5,3.5,4.5,5.5,10.5,20.5"

$isCovariateTransform [1] TRUE

$isDiagofKinSetAsOne [1] FALSE

$useSparseSigmaConditionerforPCG [1] FALSE

$useSparseSigmaforInitTau [1] FALSE

$minMAFforGRM [1] 0.01

$minCovariateCount [1] -1

$includeNonautoMarkersforVarRatio [1] FALSE

$help [1] FALSE

tauInit is 0 0 cateVarRatioMinMACVecExclude is 0.5 1.5 2.5 3.5 4.5 5.5 10.5 20.5 cateVarRatioMaxMACVecInclude is 1.5 2.5 3.5 4.5 5.5 10.5 20.5 Markers in the Plink file with MAF >= 0.01 will be used to construct GRM 8 threads are set to be used 487279 samples have genotypes formula is f_102_0_f_QUANT_pulse_minimumValue_residualizedRelated~ge_pc1+ge_pc2+ge_pc3+ge_pc4+ge_pc5+ge_pc6+ge_pc7+ge_pc8+ge_pc9+ge_pc10 424632 samples have non-missing phenotypes 62647 samples in geno file do not have phenotypes 424632 samples will be used for analysis qr transformation has been performed on covariates colnames(data.new) is Y minus1 ge_pc1 ge_pc2 ge_pc3 ge_pc4 ge_pc5 ge_pc6 ge_pc7 ge_pc8 ge_pc9 ge_pc10 out.transform$Param.transform$qrr: 11 11 f_102_0_f_QUANT_pulse_minimumValue_residualizedRelated is a quantitative trait glm:

Call: glm(formula = formula.new, family = gaussian(link = "identity"), data = data.new)

Coefficients: minus1 ge_pc1 ge_pc2 ge_pc3 ge_pc4 ge_pc5 -1.186e-05 2.324e-02 -7.390e-03 4.125e-03 -3.306e-03 -5.267e-03 ge_pc6 ge_pc7 ge_pc8 ge_pc9 ge_pc10 1.503e-02 -4.977e-03 -1.608e-03 -3.555e-03 -2.065e-03

Degrees of Freedom: 424632 Total (i.e. Null); 424621 Residual Null Deviance: 424600 Residual Deviance: 424200 AIC: 1205000 Start fitting the NULL GLMM user system elapsed 33.680 32.674 17.996 [1] "Start reading genotype plink file here" nbyte: 121820 nbyte: 106158 reserve: 2831181056

M: 26669, N: 487279 size of genoVecofPointers: 2 setgeno mark1 setgeno mark2 21751 markers with MAF >= 0.01 are used for GRM. setgeno mark5 setgeno mark6 time: 138358 [1] "Genotype reading is done" Fixed-effect coefficients: minus1 ge_pc1 ge_pc2 ge_pc3 ge_pc4 -1.185563e-05 2.323521e-02 -7.390082e-03 4.125497e-03 -3.306110e-03 ge_pc5 ge_pc6 ge_pc7 ge_pc8 ge_pc9 -5.266611e-03 1.502973e-02 -4.976552e-03 -1.607549e-03 -3.555343e-03 ge_pc10 -2.065500e-03 initial tau is 1 0 iGet_Coef: 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 Tau: [1] 1 0 Fixed-effect coefficients: [,1] [1,] -1.185587e-05 [2,] 2.323526e-02 [3,] -7.390084e-03 [4,] 4.125500e-03 [5,] -3.306111e-03 [6,] -5.266611e-03 [7,] 1.502972e-02 [8,] -4.976554e-03 [9,] -1.607554e-03 [10,] -3.555346e-03 [11,] -2.065501e-03 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 Variance component estimates: [1] 0.999007 0.000000

Iteration 1 : iGet_Coef: 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 Tau: [1] 0.999007 0.000000 Fixed-effect coefficients: [,1] [1,] -1.185624e-05 [2,] 2.323523e-02 [3,] -7.390084e-03 [4,] 4.125500e-03 [5,] -3.306111e-03 [6,] -5.266610e-03 [7,] 1.502973e-02 [8,] -4.976551e-03 [9,] -1.607549e-03 [10,] -3.555344e-03 [11,] -2.065497e-03 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 cov: 2.352639e-06 2.040671e-14 6.756491e-16 6.703705e-16 -1.372413e-16 2.765938e-15 3.07209e-15 -1.330184e-15 2.760658e-15 -2.227531e-15 6 .650916e-16 2.040671e-14 2.352642e-06 -5.772058e-15 9.079048e-16 -4.518413e-15 -8.340054e-16 6.957081e-15 -2.428117e-16 -3.800529e-15 2.090 293e-15 -1.765663e-15 6.756491e-16 -5.772058e-15 2.352642e-06 -2.111407e-16 6.450352e-15 4.344219e-15 -2.164191e-15 -4.328383e-16 -1.409033 e-15 1.900267e-15 -1.293236e-15 6.703705e-16 9.079048e-16 -2.111407e-16 2.352642e-06 4.565919e-16 1.111127e-15 2.005835e-15 -3.578833e-15 - 1.842201e-15 1.514935e-15 4.11856e-15 -1.372413e-16 -4.518413e-15 6.450352e-15 4.565919e-16 2.352644e-06 2.913742e-15 -1.974165e-15 1.05570 4e-16 -2.487501e-15 1.60467e-15 -1.171831e-15 2.765938e-15 -8.340054e-16 4.344219e-15 1.111127e-15 2.913742e-15 2.352642e-06 -8.313658e-16 6.756498e-16 4.170024e-16 -9.501329e-17 3.694958e-16 3.07209e-15 6.957081e-15 -2.164191e-15 2.005835e-15 -1.974165e-15 -8.313658e-16 2.3526 41e-06 -3.9061e-16 1.298513e-15 -1.477984e-15 -6.334212e-16 -1.330184e-15 -2.428117e-16 -4.328383e-16 -3.578833e-15 1.055704e-16 6.756498e- 16 -3.9061e-16 2.352642e-06 -1.625782e-15 -5.489658e-16 1.500417e-15 2.760658e-15 -3.800529e-15 -1.409033e-15 -1.842201e-15 -2.487501e-15 4 .170024e-16 1.298513e-15 -1.625782e-15 2.352641e-06 1.879151e-15 3.061535e-16 -2.227531e-15 2.090293e-15 1.900267e-15 1.514935e-15 1.60467e -15 -9.501329e-17 -1.477984e-15 -5.489658e-16 1.879151e-15 2.352643e-06 -2.058621e-15 6.650916e-16 -1.765663e-15 -1.293236e-15 4.11856e-15 -1.171831e-15 3.694958e-16 -6.334212e-16 1.500417e-15 3.061535e-16 -2.058621e-15 2.352641e-06 Variance component estimates: [1] 0.9951694 0.0000000 Fixed-effect coefficients: [,1] [1,] -1.185624e-05 [2,] 2.323523e-02 [3,] -7.390084e-03 [4,] 4.125500e-03 [5,] -3.306111e-03 [6,] -5.266610e-03 [7,] 1.502973e-02 [8,] -4.976551e-03 [9,] -1.607549e-03 [10,] -3.555344e-03 [11,] -2.065497e-03

Final 0.9951694 0 : iGet_Coef: 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 Tau: [1] 0.9951694 0.0000000 Fixed-effect coefficients: [,1] [1,] -1.185553e-05 [2,] 2.323519e-02 [3,] -7.390079e-03 [4,] 4.125495e-03 [5,] -3.306109e-03 [6,] -5.266611e-03 [7,] 1.502973e-02 [8,] -4.976550e-03 [9,] -1.607553e-03 [10,] -3.555340e-03 [11,] -2.065500e-03 user system elapsed 1757.683 143.154 390.020 t_end - t_begin, fitting the NULL model took user system elapsed 1724.003 110.480 372.024 [1] "step2" Start estimating variance ratios

Family: gaussian Link function: identity

sparse GRM will be used sparse GRM has been specified read in sparse GRM from /hpc/scratch/imp_res/sparseGRM_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx length(sparseGRMSampleID$IndexGRM): 487279 nrow(sparseGRMSampleID): 487279 424632 samples have been used to fit the glmm null model [1] 424632 3 IID IndexInModel IndexGRM 1 1000017 1 1 2 1000025 2 2 3 1000038 3 3 4 1000042 4 4 5 1000056 5 5 6 1000061 6 6 write sparse Sigma to /hpc/scratch/imp_res/f_102_0_f_QUANT_pulse_minimumValue_residualizedRelated/f_102_0_f_QUANT_pulse_minimumValue_residualizedRelated_cate.varianceRatio.txt_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseSigma.mtx Categorical variance ratios will be estimated 0.5 < MAC <= 1.5 1.5 < MAC <= 2.5 2.5 < MAC <= 3.5 3.5 < MAC <= 4.5 4.5 < MAC <= 5.5 5.5 < MAC <= 10.5 10.5 < MAC <= 20.5 20.5 < MAC iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 iter from getPCG1ofSigmaAndVector 1 5147 th marker G0 0 0 0 0 0 0 0 0 0 0 iter from getPCG1ofSigmaAndVector 1 t1 Finished at: Tue May 11 04:23:32 EDT 2021

aliamin222 commented 3 years ago

Hi Wei, Just an update to let you know that I changed the input plink file for step 1 and instead of using only one chromosome (chr13), I made a plink file including all rare variants with MAF<1% from all autosomal chromosomes plus the common variants that I used in step 0. It seems this resolved the issue and I do not get any error and all variance ratios files were generated. Thanks Ali

weizhouUMICH commented 2 years ago

Sorry fo the late reply! We have just released a new version 1.0.0. It has substantial computational efficiency improvements for both Step 1 and Step 2 for single-variant and set-based tests and clearer log output. We have created a new program github page https://github.com/saigegit/SAIGE with the documentation provided https://saigegit.github.io/SAIGE-doc/ The program will be maintained by multiple SAIGE developers there. The docker image has been updated. Please feel free to try the version 1.0.0 and report issues if any.

Thanks! Wei