rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
189 stars 55 forks source link

Step 1 level 0 : inconsistent bug number of variants/blocks in file don't match with that in master file #471

Open mariamaitoumelloul opened 12 months ago

mariamaitoumelloul commented 12 months ago

Hello,

Im using REGENIE v3.2.5.3 to run in parallel step 1 level0 with 10 job arrays. It used to work when I was using another dataset but with my new dataset (bigger version of the old one). I obtain this error in one of the log files from the job #4 : ERROR: number of variants/blocks in file (=62057/39) don't match with that in master file (=37110/63). This error is not occurring for the other job arrays.

How can we fix that please? It happened also when I used 50 job arrays.

joellembatchou commented 11 months ago

Hi,

Can you include the REGENIE log when you run --split-l0, the corresponding master file, as well as the full log for the parallel job where you get the error?

mariamaitoumelloul commented 11 months ago

log when I run --split-l0 and master file : regenie_log_master_file.zip

full log for the parallel job: full_log parallele jobs.zip

Please find the script Im running below here:

#!/bin/sh
## This script will copy data between different directories
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --array=1-10
##SBATCH --cpus-per-task 16
#SBATCH --mem 80G 
#SBATCH --time 72:00:00
#SBATCH --output=log/1_step1A_b1%A_%a.o
#SBATCH --error=log/1_step1A_%A_b1_%a.e
#SBATCH --job-name=log/1b1_step1A_%A_%a

start=`date +%s`
echo "START AT $(date)"

regenie='XXX/tools/regenie/regenie_v3.2.5.3.gz_x86_64_Linux' 
njobs=$SLURM_ARRAY_TASK_COUNT
echo "$njobs"
echo "$SLURM_ARRAY_TASK_ID"

## Step 1 of regenie
base_command="$regenie \
  --step 1 \
  --ref-first \
  --loocv \
  --phenoFile XXX/phenotypes_batch1.txt \
  --covarFile XXX/covariates.txt \
  --bsize 1000 \
  --lowmem \
  --gz"

$base_command \
 --bed  XXX \
 --out ../output_step1_b1/mQTL_shcs_l0 \
 --split-l0 ../output_step1_b1/mQTL_shcs_parallel,$njobs \

$base_command \
 --bed .XXX \
 --out ../output_step1_b1/mQTL_shcs_l0_$SLURM_ARRAY_TASK_ID \
 --run-l0 ../output_step1_b1/mQTL_shcs_parallel.master,$SLURM_ARRAY_TASK_ID \

# print end date and echo total runtime
end=`date +%s`
runtime=$((end-start))
echo Runtime: $runtime seconds
joellembatchou commented 11 months ago

Can you check the number of variants in each of the snplist files, i.e. wc -l mQTL_shcs_parallel_job*.snplist ?

Does it match up with the number of variants specified in the master file?