rotary-genomics / rotary

Assembly/annotation workflow for Nanopore-based microbial genome data containing circular DNA elements
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Repair.py fails to write repair_info.tsv when no circular contigs are identified by Flye. #116

Closed LeeBergstrand closed 5 months ago

LeeBergstrand commented 5 months ago

Problem Description

Error:

mv en_leafy/assembly/end_repair/repaired_info.tsv en_leafy/assembly/end_repair/en_leafy_repaired_info.tsv
mv: cannot stat 'en_leafy/assembly/end_repair/repaired_info.tsv': No such file or directory

rule assembly_end_repair:
    input: en_leafy/qc/en_leafy_qc_long.fastq.gz, en_leafy/assembly/flye/en_leafy_assembly.fasta, en_leafy/assembly/flye/en_leafy_assembly_info.txt
    output: en_leafy/assembly/end_repair/en_leafy_repaired.fasta, en_leafy/assembly/end_repair/en_leafy_repaired_info.tsv, en_leafy/assembly/end_repair
    log: en_leafy/logs/assembly/end_repair.log
    jobid: 18
    benchmark: en_leafy/benchmarks/assembly/end_repair.txt
    reason: Forced execution
    wildcards: sample=en_leafy
    threads: 4
    resources: tmpdir=/tmp, mem=11

        rotary-repair --long_read_filepath en_leafy/qc/en_leafy_qc_long.fastq.gz           --assembly_fasta_filepath en_leafy/assembly/flye/en_leafy_assembly.fasta           --assembly_info_filepath en_leafy/assembly/flye/en_leafy_assembly_info.txt           --output_dir en_leafy/assembly/end_repair           --flye_read_mode nano-hq           --flye_read_error 0           --circlator_min_id 99           --circlator_min_length 10000           --circlator_ref_end 100           --circlator_reassemble_end 100           --threads 4           --threads_mem 11           --overwrite                      > en_leafy/logs/assembly/end_repair.log 2>&1
        mv en_leafy/assembly/end_repair/repaired.fasta en_leafy/assembly/end_repair/en_leafy_repaired.fasta 
        mv en_leafy/assembly/end_repair/repaired_info.tsv en_leafy/assembly/end_repair/en_leafy_repaired_info.tsv

mv: cannot stat 'en_leafy/assembly/end_repair/repaired_info.tsv': No such file or directory
[Tue Feb  6 13:42:36 2024]
Error in rule assembly_end_repair:
    jobid: 18
    input: en_leafy/qc/en_leafy_qc_long.fastq.gz, en_leafy/assembly/flye/en_leafy_assembly.fasta, en_leafy/assembly/flye/en_leafy_assembly_info.txt
    output: en_leafy/assembly/end_repair/en_leafy_repaired.fasta, en_leafy/assembly/end_repair/en_leafy_repaired_info.tsv, en_leafy/assembly/end_repair
    log: en_leafy/logs/assembly/end_repair.log (check log file(s) for error details)
    shell:

        rotary-repair --long_read_filepath en_leafy/qc/en_leafy_qc_long.fastq.gz           --assembly_fasta_filepath en_leafy/assembly/flye/en_leafy_assembly.fasta           --assembly_info_filepath en_leafy/assembly/flye/en_leafy_assembly_info.txt           --output_dir en_leafy/assembly/end_repair           --flye_read_mode nano-hq           --flye_read_error 0           --circlator_min_id 99           --circlator_min_length 10000           --circlator_ref_end 100           --circlator_reassemble_end 100           --threads 4           --threads_mem 11           --overwrite                      > en_leafy/logs/assembly/end_repair.log 2>&1
        mv en_leafy/assembly/end_repair/repaired.fasta en_leafy/assembly/end_repair/en_leafy_repaired.fasta 
        mv en_leafy/assembly/end_repair/repaired_info.tsv en_leafy/assembly/end_repair/en_leafy_repaired_info.tsv

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Exiting because a job execution failed. Look above for error message

Log File:

[ 2024-02-06 13:42:36 ]: INFO: main: Running rotary-repair
[ 2024-02-06 13:42:36 ]: WARNING: main: Output directory already exists: "en_leafy/assembly/end_repair". Files may be overwritten.
[ 2024-02-06 13:42:36 ]: INFO: run_end_repair: No circular contigs. Will copy the input file and finish early.
[ 2024-02-06 13:42:36 ]: INFO: run_end_repair: Pipeline finished.

It appears that repair.py fails to write a repaired_info.tsv if exits early due to having no circular contigs.

Specifically:

# Get lists of circular contigs
    circular_contig_names = parse_assembly_info_file(assembly_info_filepath, assembly_info_type, return_type='circular')

    # No need to run the pipeline if there are no circular contigs
    if len(circular_contig_names) == 0:
        logger.info('No circular contigs. Will copy the input file and finish early.')
        shutil.copyfile(assembly_fasta_filepath, end_repaired_contigs_filepath)
        logger.info('Pipeline finished.')
        sys.exit(0)

    # Subset circular contigs from the full file and save to disk
    os.makedirs(linking_outdir_base, exist_ok=True)
    with open(circular_contig_tmp_fasta, 'w') as output_handle:

Problem Solution

Modify repair.py so it writes the linear contigs to a repair_info.tsv file.

LeeBergstrand commented 5 months ago

Being addressed in https://github.com/rotary-genomics/rotary/tree/fix_repair_corner_case

LeeBergstrand commented 5 months ago

Merged in.

jmtsuji commented 5 months ago

@LeeBergstrand Thanks for fixing this!