rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
182 stars 53 forks source link

Step2 Segmentation fault #467

Closed Lloyd-LiuSiyi closed 10 months ago

Lloyd-LiuSiyi commented 10 months ago

Hi everyone, I encountered this segmentation fault when performing chromosome-specific step2 analysis using UKBB 500k WES data. The log file is presented here:

Start time: Tue Nov  7 00:20:59 2023

              |===========================|
              |      REGENIE v3.3.gz      |
              |===========================|

Copyright (c) 2020-2023 Joelle Mbatchou, Andrey Ziyatdinov and Jonathan Marchini.
Distributed under the MIT License.
Compiled with Boost Iostream library.
Using Intel MKL with Eigen.

Log of output saved in file : regenie_step2_chr15_firth.log

Options in effect:
  --step 2 \
  --bed chr15_420k \
  --covarFile regenie_covar_450k.txt \
  --covarColList age,PC{1:10} \
  --catCovarList sex,ethnicity,genotype_array \
  --phenoFile regenie_pheno_500k.txt \
  --phenoCol mdd,aitd \
  --bsize 200 \
  --bt \
  --firth \
  --approx \
  --pThresh 0.01 \
  --pred regenie_step1_output_pred.list \
  --out regenie_step2_chr15_firth

Association testing mode with fast multithreading using OpenMP
 * bim              : [chr15_420k.bim] n_snps = 877908
 * fam              : [chr15_420k.fam] n_samples = 421046
 * bed              : [chr15_420k.bed]
 * phenotypes       : [regenie_pheno_500k.txt] n_pheno = 2
   -number of phenotyped individuals  = 421046
 * covariates       : [regenie_covar_450k.txt] n_cov = 14
 Segmentation fault

I've tried setting different --bsize such as 200, 400 and 1000, but the error still occurred after reading the covariate files. I had successfully run step1 using 500k well-genotyped variants and the exact 420k samples, and example run for step2 went smoothly. Should I extract a smaller subset of varaints? Probably this wasn't a memory issue as I submitted the job to a HPC with 700GB of memory. Any suggestions will be appreciated. Thank you!

Lloyd-LiuSiyi commented 10 months ago

Problem solved. When ids (FID, IID) don't exactly match in .bim pheno and covar files, a memory allocation/segmentation error occurs.