statgen / Minimac4

GNU General Public License v3.0
54 stars 17 forks source link

Error: I/O failed while merging #68

Open lingjoyo opened 7 months ago

lingjoyo commented 7 months ago

Hi everyone

The minimac4 run well for some chromosomes, like chr1to10. But reported error from chr11 in merging step:

Writing temp files took 47 seconds Merging temp files ... Error: I/O failed while merging Error: failed merging temp files

Here is my code: chr=11 minimac4 \ 1000g_phase3_v5.chr${chr}.with_parameter_estimates.msav \ 1.2_preinputation_check/qc_3rd-updated-chr${chr}.vcf.gz \ --min-ratio 1e-6 \ --threads 10 \ -o c${chr}.imputed.vcf.gz

Has anyone met the same problem?

jonathonl commented 7 months ago

Which version are your running (minimac4 --version)?

Is it possible that you are running out of disk space to store the output files?

lingjoyo commented 7 months ago

Hi Jonathonl, Thans for your reply.

It's minimac v4.1.6. The computing resources are:

storage 16T free space cup 32 memory 370G

It shouldn't be the problem of space. By now, it's running well on chr1, chr2.

jonathonl commented 7 months ago

Can you provide the full log output?

lingjoyo commented 7 months ago

Here is the log :

minimac v4.1.6

Imputing 11:1-20000000 ... Loading target haplotypes ... Loading target haplotypes took 1 seconds Loading reference haplotypes ... Loading reference haplotypes took 2 seconds Typed sites to imputed sites ratio: 0.00066783 (246/368357) 4426 variants are exclusive to target file and will be excluded from output Running HMM with 1 threads ... Completed 200 of 1401 samples Completed 400 of 1401 samples Completed 600 of 1401 samples Completed 800 of 1401 samples Completed 1000 of 1401 samples Completed 1200 of 1401 samples Completed 1400 of 1401 samples Completed 1401 of 1401 samples Running HMM took 392 seconds

Writing temp files took 49 seconds Merging temp files ... Error: I/O failed while merging Error: failed merging temp files

jonathonl commented 7 months ago

I would try running with --temp-prefix c${chr}.tmp_ so that the temp files are written to the same directory as your output file.

lingjoyo commented 7 months ago

It works well if I put everything into one folder:

./minimac4 1000g_phase3_v5.chr22.with_parameter_estimates.msav \
qc_3rd-updated-chr22.vcf.gz \
-o c22.imputed.vcf.gz \
--min-r2 0.3 --min-ratio 1e-6 \
--temp-prefix c22.tmp_ 

But it will report the merging error if I give the absolute path to all inputs and outputs:

${minimac4} \
${g1k_p3}1000g_phase3_v5.chr${chr}.with_parameter_estimates.msav \
${wkdir}/1.2_preinputation_check/qc_3rd-updated-chr${chr}.vcf.gz \
-o ${wkdir}1.3_imputaion_minimac4_g1kp3/c${chr}.imputed.vcf.gz \
--min-r2 0.3 --min-ratio 1e-6 \
--temp-prefix c${chr}.tmp_  

Here is the log:

Imputing 22:1-20000000 ... Loading target haplotypes ... Loading target haplotypes took 0 seconds Loading reference haplotypes ... Loading reference haplotypes took 1 seconds Typed sites to imputed sites ratio: 1.53001e-05 (1/65359) 691 variants are exclusive to target file and will be excluded from output Running HMM with 1 threads ... Completed 200 of 1401 samples Completed 400 of 1401 samples Completed 600 of 1401 samples Completed 800 of 1401 samples Completed 1000 of 1401 samples Completed 1200 of 1401 samples Completed 1400 of 1401 samples Completed 1401 of 1401 samples Running HMM took 32 seconds Writing temp files took 3 seconds Merging temp files ... Error: I/O failed while merging Error: failed merging temp files

So the problem is that the code couldn't find the temp file. When I set --temp-prefix ${wkdir}/1.2_preinputation_check/c${chr}.tmp_, it reported

minimac v4.1.6

Imputing 22:1-20000000 ... Loading target haplotypes ... Loading target haplotypes took 0 seconds Loading reference haplotypes ... Loading reference haplotypes took 1 seconds Typed sites to imputed sites ratio: 1.53001e-05 (1/65359) 691 variants are exclusive to target file and will be excluded from output Running HMM with 1 threads ... Error: could not open temp file (/full-path-to/1.3_imputaion_minimac4_g1kp3/c22.tmp_0_XXXXXX)

lingjoyo commented 7 months ago

I guess the problem is about the setting to temp files. What's the right way of setting --temp-prefix if I want to submit the job using SBATCH?

jonathonl commented 7 months ago

Relative vs absolute paths shouldn't matter. I'm guessing that the output paths are invalid or unreachable from the compute node. Are you creating the full directory paths before running minimac4 (i.e., does the /full-path-to/1.3_imputaion_minimac4_g1kp3/ directory already exist)? I would add tests to your batch script before the minimac4 command to test that you can create new files in the directory you are writing output files. This would look something like:

set -e
out_vcf=${wkdir}/1.3_imputaion_minimac4_g1kp3/c${chr}.imputed.vcf.gz
touch $out_vcf
minimac4 -o $out_vcf ${g1k_p3}1000g_phase3_v5.chr${chr}.with_parameter_estimates.msav \
${wkdir}/1.2_preinputation_check/qc_3rd-updated-chr${chr}.vcf.gz \
--min-r2 0.3 --min-ratio 1e-6 \
-o $out_vcf

Note: you don't need to use absolute paths in Slurm as long as the directory you call sbatch from is accessible from the compute node.