rhysnewell / aviary

A hybrid assembly and MAG recovery pipeline (and more!)
GNU General Public License v3.0
82 stars 12 forks source link

Error in prepare_binning_files #149

Closed s-junguy closed 1 year ago

s-junguy commented 1 year ago

Hi, I've got an error in rule prepare_binning_files. I use Illumina paired end libraries (10 GB per library) and this is my submission script:

!/bin/bash

SBATCH --cpus-per-task=16

SBATCH --time=72:00:00

SBATCH --partition=pinky

SBATCH --ntasks=1

SBATCH -o aviary.%j.%N.out

SBATCH -e aviary.%j.%N.err

source /home/usr/miniconda3/bin/activate /home/usr/miniconda3/envs/aviary cd testfastq aviary complete -1 P19752_101_S33_L002_R1_001.fastq.gz -2 P19752_101_S33_L002_R2_001.fastq.gz --max_memory 200

This is the error message: rule prepare_binning_files: input: assembly/final_contigs.fasta output: data/maxbin.cov.list, data/coverm.cov log: logs/coverm_prepare.log jobid: 1 reason: Missing output files: data/coverm.cov; Input files updated by another job: assembly/final_contigs.fasta threads: 8 resources: tmpdir=/tmp, mem_mb=204800, runtime=2880

Activating conda environment: ../miniconda3/59d4b88ea6ac8d234b7c362bcf43b44f Activating conda environment: ../miniconda3/59d4b88ea6ac8d234b7c362bcf43b44f Traceback (most recent call last): File "/home/usr/testfastq/.snakemake/scripts/tmpvjqkjkum.get_coverage.py", line 187, in get_coverage( File "/home/usr/testfastq/.snakemake/scripts/tmpvjqkjkum.get_coverage.py", line 166, in get_coverage for i in range(len(cov_list[0])): IndexError: list index out of range [Fri Sep 15 14:44:23 2023] Error in rule prepare_binning_files: jobid: 1 output: data/maxbin.cov.list, data/coverm.cov log: logs/covermprepare.log (check log file(s) for error message) conda-env: /home/usr/miniconda3/59d4b88ea6ac8d234b7c362bcf43b44f

RuleException: CalledProcessError in line 66 of /home/usr/aviary/aviary/modules/binning/binning.smk: Command 'source /home/usr/miniconda3/envs/aviary/bin/activate '/home/usr/miniconda3/59d4b88ea6ac8d234b7c362bcf43b44f_'; set -euo pipefail; python /home/usr/testfastq/.snakemake/scripts/tmpvjqkjkum.get_coverage.py' returned non-zero exit status 1. File "/home/usr/aviary/aviary/modules/binning/binning.smk", line 66, in __rule_prepare_binning_files File "/home/usr/miniconda3/envs/aviary/lib/python3.10/concurrent/futures/thread.py", line 58, in run Removing output files of failed job prepare_binning_files since they might be corrupted: data/maxbin.cov.list, data/coverm.cov Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-09-14T145107.916827.snakemake.log 09/15/2023 02:44:23 PM CRITICAL: Command '['snakemake', '--snakefile', '/home/usr/aviary/aviary/modules/Snakefile', '-- directory', '/home/usr/testfastq', '--cores', '16', '--rerun-incomplete', '--keep-going', '--rerun-triggers', 'mtime', '--configfile', '/home/usr/testfastq/config.yaml', '--nolock', '--retries', '0', '--conda-frontend', 'mamba', '--resour ces', 'mem_mb=204800', '--use-conda', '--conda-prefix', '~/miniconda3', 'get_bam_indices']' returned non-zero exit status 1.

wwood commented 1 year ago

Hi,

Are the log files informative? That's where the actual error that occurred will be described hopefully.

s-junguy commented 1 year ago

This is the given log file and I do not think that it states anything further: [Fri Sep 15 14:13:29 2023] rule prepare_binning_files: input: assembly/final_contigs.fasta output: data/maxbin.cov.list, data/coverm.cov log: logs/coverm_prepare.log jobid: 1 reason: Missing output files: data/coverm.cov; Input files updated by another job: assembly/final_contigs.fasta threads: 8 resources: tmpdir=/tmp, mem_mb=204800, runtime=2880

Activating conda environment: ../miniconda3/59d4b88ea6ac8d234b7c362bcf43b44f Activating conda environment: ../miniconda3/59d4b88ea6ac8d234b7c362bcf43b44f [Fri Sep 15 14:44:23 2023] Error in rule prepare_binning_files: jobid: 1 output: data/maxbin.cov.list, data/coverm.cov log: logs/covermprepare.log (check log file(s) for error message) conda-env: /home/s-junguy/miniconda3/59d4b88ea6ac8d234b7c362bcf43b44f

RuleException: CalledProcessError in line 66 of /home/s-junguy/aviary/aviary/modules/binning/binning.smk: Command 'source /home/s-junguy/miniconda3/envs/aviary/bin/activate '/home/s-junguy/miniconda3/59d4b88ea6ac8d234b7c362bcf43b44f_'; set -euo pipefail; python /home/s-junguy/testfastq/.snakemake/scripts/tmpvjqkjkum.get_coverage.py' returned non-zero exit status 1. File "/home/s-junguy/aviary/aviary/modules/binning/binning.smk", line 66, in __rule_prepare_binning_files File "/home/s-junguy/miniconda3/envs/aviary/lib/python3.10/concurrent/futures/thread.py", line 58, in run Removing output files of failed job prepare_binning_files since they might be corrupted: data/maxbin.cov.list, data/coverm.cov Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-09-14T145107.916827.snakemake.log

rhysnewell commented 1 year ago

Log files are written to individual files for each rule within the output folder in the logs/ folder as stated in the first error message you posted in the rule descriptor. It should be in log: logs/coverm_prepare.log

s-junguy commented 1 year ago

This is the logs/coverm_prepare.log: [2023-09-15T14:13:38Z INFO coverm] CoverM version 0.6.1 [2023-09-15T14:13:38Z INFO coverm] Setting single read percent identity threshold at 0.97 for MetaBAT adjusted coverage, and not filtering out supplementary, secondary and improper pair alignments [2023-09-15T14:13:38Z INFO coverm] Writing output to file: data/short_cov.tsv [2023-09-15T14:13:38Z INFO coverm] Using min-covered-fraction 0% [2023-09-15T14:13:38Z INFO bird_tool_utils::external_command_checker] Found minimap2 version 2.26-r1175 [2023-09-15T14:13:38Z INFO bird_tool_utils::external_command_checker] Found samtools version 1.17 [2023-09-15T14:13:38Z INFO coverm] Creating cache directory data/binning_bams/ [2023-09-15T14:13:38Z INFO coverm] Not pre-generating minimap2 index [2023-09-15T14:13:38Z INFO coverm] Caching BAM file to data/binning_bams//final_contigs.fasta.P19752_101_S33_L002_R1_001.fastq.gz.bam [W::bgzf_read_block] EOF marker is absent. The input is probably truncated [2023-09-15T14:44:23Z INFO coverm::contig] In sample 'final_contigs.fasta/P19752_101_S33_L002_R1_001.fastq.gz', found 0 reads mapped out of 0 total (NaN%) [2023-09-15T14:44:23Z WARN coverm::contig] No primary alignments were observed for sample final_contigs.fasta/P19752_101_S33_L002_R1_001.fastq.gz - perhaps something went wrong in the mapping? [2023-09-15T14:44:23Z ERROR coverm::bam_generator] Error when running mapping process. Exitstatus was ExitStatus(unix_wait_status(256)). Command run was: ["bash -c \"set -e -o pipefail; minimap2 --split-prefix /tmp/.tmp3lZqVs -a -x sr -t 8 'assembly/final_contigs.fasta' '/home/s-junguy/testfastq/P19752_101_S33_L002_R1_001.fastq.gz' '/home/s-junguy/testfastq/P19752_101_S33_L002_R2_001.fastq.gz' 2>/tmp/.tmpKIrC2g | remove_minimap2_duplicated_headers 2>/tmp/.tmpBMgMHr| samtools sort -T '/tmp/coverm_fifo.DdiheaYAbxvm/coverm-make-samtools-sortTXhO3t' -l0 -@ 7 2>/tmp/.tmp9VmyIp |tee \"/tmp/coverm_fifo.DdiheaYAbxvm/foo.pipe\" |samtools view -F4 -@ 7 -b -o 'data/binning_bams//final_contigs.fasta.P19752_101_S33_L002_R1_001.fastq.gz.bam' 2>/tmp/.tmpCTDZBe\""] [2023-09-15T14:44:23Z ERROR coverm::bam_generator] The STDERR for the MINIMAP2_SR part was: [M::mm_idx_gen::26.9281.46] collected minimizers [M::mm_idx_gen::29.3731.99] sorted minimizers [M::main::29.3731.99] loaded/built the index for 6451200 target sequence(s) [M::mm_mapopt_update::29.3731.99] mid_occ = 1000 [M::mm_idx_stat] kmer size: 21; skip: 11; is_hpc: 0; #seq: 6451200 [M::mm_idx_stat::32.188*1.90] distinct minimizers: 250241357 (97.56% are singletons); average occurrences: 1.113; average spacing: 6.589; total length: 1834387885

[M::main] Version: 2.26-r1175
[M::main] CMD: minimap2 --split-prefix /tmp/.tmp3lZqVs -a -x sr -t 8 assembly/final_contigs.fasta /home/s-junguy/testfastq/P19752_101_S33_L002_R1_001.fastq.gz /home/s-junguy/testfastq/P19752_101_S33_L002_R2_001.fastq.gz
[M::main] Real time: 1835.864 sec; CPU: 12683.138 sec; Peak RSS: 15.169 GB

[2023-09-15T14:44:23Z ERROR coverm::bam_generator] The STDERR for the samtools sort part was: [bam_sort_core] merging from 9 files and 7 in-memory blocks...

[2023-09-15T14:44:23Z ERROR coverm::bam_generator] The STDERR for the samtools view for cache part was: [E::hts_open_format] Failed to open file "data/binning_bams//final_contigs.fasta.P19752_101_S33_L002_R1_001.fastq.gz.bam" : No such file or directory samtools view: failed to open "data/binning_bams//final_contigs.fasta.P19752_101_S33_L002_R1_001.fastq.gz.bam" for writing: No such file or directory

[2023-09-15T14:44:23Z ERROR coverm::bam_generator] The STDERR for the remove_minimap2_duplicated_headers part was: [2023-09-15T14:44:23Z ERROR coverm::bam_generator] Cannot continue since mapping failed.

rhysnewell commented 1 year ago

CoverM generally fails like this when it runs out of space when creating the bam file within the specified temporary directory. Try setting the --tmpdir parameter to a folder that has more space if possible

s-junguy commented 1 year ago

Thank you for your help! It worked.