metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
368 stars 97 forks source link

Rule find counts per region - error #75

Closed ghost closed 6 years ago

ghost commented 6 years ago

I encountered an error while running the assemble workflow on Atlas (version 1.0.19). From the counts_perregion.log, the error was

Warning: Unknown annotation format: gtf. GTF format is used. ERROR: invalid parameter: '−−minOverlap'

It seems the encoding of the two dashes ('--') used in the minimum overlap parameter is the issue.

import chardet s_encoding = chardet.detect('−−minOverlap')['encoding'] print s_encoding utf-8

Possible solution was to retype dashes from "−−minOverlap" parameter.

s_retyped = chardet.detect('--minOverlap')['encoding'] print s_retyped ascii

I also added a space before the backslash on lines 684,685, and 689 from the assemble.snakefile. After both changes, the assemble workflow was successfully completed.

Below is the complete log

Building DAG of jobs... Creating conda environment /home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/envs/optional_genome_binning.yaml... Environment for ../../../../../home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/envs/optional_genome_binning.yaml created (location: .snakemake/conda/9596cb25) Creating conda environment /home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/envs/required_packages.yaml... Environment for ../../../../../home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/envs/required_packages.yaml created (location: .snakemake/conda/212a2e89) Provided cores: 24 Rules claiming more threads will be scaled down. Unlimited resources: mem Job counts: count jobs 1 QC_report 1 add_contig_metadata 1 align_reads_to_final_contigs 1 all 1 build_decontamination_db 2 calculate_contigs_stats 1 calculate_insert_size 1 calculate_prefiltered_contig_coverage_stats 1 combine_insert_stats 1 combine_read_counts 1 combine_read_length_stats 1 convert_gff_to_gtf 1 convert_sam_to_bam 1 decontamination 1 deduplicate 1 error_correction 1 filter_by_coverage 1 finalize_QC 1 finalize_contigs 1 find_counts_per_region 1 init_QC 1 initialize_checkm 1 make_maxbin_abundance_file 1 merge_pairs 1 merge_sample_tables 1 normalize_coverage_across_kmers 1 parse_blastp 1 pileup 1 postprocess_after_decontamination 1 quality_filter 5 read_stats 1 rename_contigs 1 rename_megahit_output 1 run_checkm_lineage_wf 1 run_checkm_tree_qa 1 run_diamond_blastp 1 run_maxbin 1 run_megahit 1 run_prokka_annotation 1 sort_munged_blast_hits 1 update_prokka_tsv 46

rule init_QC: input: /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/data/PHH12_O-8024.3.89990.GGTAGC.fastq.gz output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R2.fastq.gz log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_init.log jobid: 32 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC priority: 80 threads: 24 resources: mem=40

reformat.sh in=/media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/data/PHH12_O-8024.3.89990.GGTAGC.fastq.gz interleaved=t out1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R1.fastq.gz out2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R2.fastq.gz qout=33 overwrite=true verifypaired=t addslash=t trimreaddescription=t threads=24 -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_init.log

Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89. Finished job 32. 1 of 46 steps (2%) done

rule read_stats: input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R2.fastq.gz output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/raw.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/raw_read_counts.tsv log: PHH12-O-8024.3.89990.GGTAGC/logs/read_stats.log jobid: 13 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, step=raw priority: 30 threads: 24 resources: mem=40

Finished job 13. 2 of 46 steps (4%) done

rule deduplicate: input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R2.fastq.gz output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R2.fastq.gz log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_deduplicate.log jobid: 33 benchmark: logs/benchmarks/deduplicate/PHH12-O-8024.3.89990.GGTAGC.txt wildcards: sample=PHH12-O-8024.3.89990.GGTAGC threads: 24 resources: mem=40

        clumpify.sh                 in=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R1.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R2.fastq.gz                 out1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R1.fastq.gz out2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R2.fastq.gz                 overwrite=true                dedupe=t                 dupesubs=2                 optical=f                threads=24                 -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_deduplicate.log

Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R1.fastq.gz. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_raw_R2.fastq.gz. Finished job 33. 3 of 46 steps (7%) done

rule read_stats: input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R2.fastq.gz output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/deduplicated.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/deduplicated_read_counts.tsv log: PHH12-O-8024.3.89990.GGTAGC/logs/read_stats.log jobid: 17 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, step=deduplicated priority: 30 threads: 24 resources: mem=40

Finished job 17. 4 of 46 steps (9%) done

rule quality_filter: input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R2.fastq.gz output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_quality_filtering_stats.txt log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_quality_filter.log jobid: 19 benchmark: logs/benchmarks/quality_filter/PHH12-O-8024.3.89990.GGTAGC.txt wildcards: sample=PHH12-O-8024.3.89990.GGTAGC threads: 24 resources: mem=40

    bbduk2.sh in1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R1.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R2.fastq.gz             out1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R1.fastq.gz out2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R2.fastq.gz outs=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_se.fastq.gz             rref=/home/william/dbs/atlas_db.v2/adapters.fa lref=/home/william/dbs/atlas_db.v2/adapters.fa             mink=8 qout=33 stats=PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_quality_filtering_stats.txt             hdist=1 k=27 trimq=10             qtrim=rl threads=24             minlength=51 trd=t             minbasefrequency=0.05             interleaved=t            overwrite=true             ecco=t             -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_quality_filter.log

Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R1.fastq.gz. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_deduplicated_R2.fastq.gz. Finished job 19. 5 of 46 steps (11%) done

rule read_stats: input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R2.fastq.gz output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/filtered.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/filtered_read_counts.tsv log: PHH12-O-8024.3.89990.GGTAGC/logs/read_stats.log jobid: 15 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, step=filtered priority: 30 threads: 24 resources: mem=40

Finished job 15. 6 of 46 steps (13%) done

rule build_decontamination_db: output: ref/genome/1/summary.txt log: logs/build_decontamination_db.log jobid: 31 threads: 24 resources: mem=40

bbsplit.sh -Xmx40G ref_PhiX=/home/william/dbs/atlas_db.v2/phiX174_virus.fa ref_rRNA=/home/william/dbs/atlas_db.v2/silva_rfam_all_rRNAs.fa threads=24 k=13 local=t 2> logs/build_decontamination_db.log Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89. Finished job 31. 7 of 46 steps (15%) done

rule decontamination: input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_se.fastq.gz, ref/genome/1/summary.txt output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/PhiX_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/PhiX_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/PhiX_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/rRNA_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/rRNA_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/rRNA_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_decontamination_reference_stats.txt log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_decontamination.log jobid: 12 benchmark: logs/benchmarks/decontamination/PHH12-O-8024.3.89990.GGTAGC.txt wildcards: sample=PHH12-O-8024.3.89990.GGTAGC threads: 24 resources: mem=40

        if [ "true" = true ] ; then
            bbsplit.sh in1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R1.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R2.fastq.gz                     outu1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R1.fastq.gz outu2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R2.fastq.gz                     basename="PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/%_R#.fastq.gz"                     maxindel=20 minratio=0.65                     minhits=1 ambiguous=best refstats=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_decontamination_reference_stats.txt                    threads=24 k=13 local=t                     -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_decontamination.log
        fi

        bbsplit.sh in=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_se.fastq.gz                  outu=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_se.fastq.gz                 basename="PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/contaminants/%_se.fastq.gz"                 maxindel=20 minratio=0.65                 minhits=1 ambiguous=best refstats=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_decontamination_reference_stats.txt append                 interleaved=f threads=24 k=13 local=t                 -Xmx40G 2>> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_decontamination.log

Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R2.fastq.gz. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_se.fastq.gz. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_filtered_R1.fastq.gz. Finished job 12. 8 of 46 steps (17%) done

rule read_stats: input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R2.fastq.gz output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/clean.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/clean_read_counts.tsv log: PHH12-O-8024.3.89990.GGTAGC/logs/read_stats.log jobid: 14 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, step=clean priority: 30 threads: 24 resources: mem=40

Finished job 14. 9 of 46 steps (20%) done

localrule postprocess_after_decontamination: input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_se.fastq.gz output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz jobid: 11 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC

localrule initialize_checkm: output: logs/checkm_init.txt log: logs/initialize_checkm.log jobid: 29

python /home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/rules/initialize_checkm.py /home/william/dbs/atlas_db.v2/checkm logs/checkm_init.txt logs/initialize_checkm.log Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/9596cb25. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R1.fastq.gz. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_se.fastq.gz. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_clean_R2.fastq.gz. Finished job 11. 10 of 46 steps (22%) done Finished job 29. 11 of 46 steps (24%) done

rule read_stats: input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_counts.tsv log: PHH12-O-8024.3.89990.GGTAGC/logs/read_stats.log jobid: 16 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, step=QC priority: 30 threads: 24 resources: mem=40

Finished job 16. 12 of 46 steps (26%) done

rule normalize_coverage_across_kmers: input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz output: PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_se.fastq.gz log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_normalization.log jobid: 45 benchmark: logs/benchmarks/normalization/PHH12-O-8024.3.89990.GGTAGC.txt wildcards: sample=PHH12-O-8024.3.89990.GGTAGC threads: 24 resources: mem=40

    if [ in=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz != "null" ];
    then
        bbnorm.sh in=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz                 extra=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz,PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz                 out=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_se.fastq.gz                 k=21 t=100                 interleaved=f minkmers=15 prefilter=t                 threads=24                 -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_normalization.log
    fi

    if [ t = "t" ];
    then
        bbnorm.sh in=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz                 extra=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz                 out=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R1.fastq.gz out2=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R2.fastq.gz                 k=21 t=100                 interleaved=f minkmers=15 prefilter=t                 threads=24                 -Xmx40G 2>> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_normalization.log
    fi

Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89. Finished job 45. 13 of 46 steps (28%) done

rule merge_pairs: input: PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_se.fastq.gz output: PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_se.fastq.gz log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_merge_pairs.log jobid: 44 benchmark: logs/benchmarks/merge_pairs/PHH12-O-8024.3.89990.GGTAGC.txt wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, previous_steps=normalized threads: 24 resources: mem=40

Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R2.fastq.gz. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_se.fastq.gz. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized_R1.fastq.gz. Finished job 44. 14 of 46 steps (30%) done

rule error_correction: input: PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_se.fastq.gz output: PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_se.fastq.gz log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_error_correction.log jobid: 43 benchmark: logs/benchmarks/error_correction/PHH12-O-8024.3.89990.GGTAGC.txt wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, previous_steps=normalized.merged threads: 24 resources: mem=40

    tadpole.sh -Xmx40G             prealloc=1             in1=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R1.fastq.gz,PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_se.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R2.fastq.gz             out1=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R1.fastq.gz,PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_se.fastq.gz out2=PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R2.fastq.gz             mode=correct             threads=24             ecc=t ecco=t 2>> PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_error_correction.log

Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R2.fastq.gz. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_R1.fastq.gz. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged_se.fastq.gz. Finished job 43. 15 of 46 steps (33%) done

rule run_megahit: input: PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_se.fastq.gz output: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter.contigs.fa log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_megahit.log jobid: 40 benchmark: logs/benchmarks/assembly/PHH12-O-8024.3.89990.GGTAGC.txt wildcards: sample=PHH12-O-8024.3.89990.GGTAGC threads: 8 resources: mem=50

Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R2.fastq.gz. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_se.fastq.gz. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/reads/normalized.merged.errorcorr_R1.fastq.gz. Finished job 40. 16 of 46 steps (35%) done

localrule rename_megahit_output: input: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter.contigs.fa output: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_raw_contigs.fasta jobid: 34 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC

cp PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter.contigs.fa PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_raw_contigs.fasta Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter.contigs.fa. Finished job 34. 17 of 46 steps (37%) done

rule rename_contigs: input: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_raw_contigs.fasta output: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta jobid: 23 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC

rename.sh in=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_raw_contigs.fasta out=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta ow=t prefix=PHH12-O-8024.3.89990.GGTAGC Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_raw_contigs.fasta. Finished job 23. 18 of 46 steps (39%) done

rule calculate_prefiltered_contig_coverage_stats: input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta output: PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/prefilter_coverage_stats.txt, PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/alignement_to_prefilter_contigs.sam log: PHH12-O-8024.3.89990.GGTAGC/assembly/logs/prefiltered_contig_coverage_stats.log jobid: 35 benchmark: logs/benchmarks/calculate_prefiltered_contig_coverage_stats/PHH12-O-8024.3.89990.GGTAGC.txt wildcards: sample=PHH12-O-8024.3.89990.GGTAGC threads: 24 resources: mem=40

bbwrap.sh nodisk=t ref=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta in1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz,PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz,null fast=t interleaved=auto threads=24 -Xmx40G append out=PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/alignement_to_prefilter_contigs.sam 2> PHH12-O-8024.3.89990.GGTAGC/assembly/logs/prefiltered_contig_coverage_stats.log

        pileup.sh ref=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta in=PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/alignement_to_prefilter_contigs.sam threads=24             -Xmx40G covstats=PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/prefilter_coverage_stats.txt physcov 2>> PHH12-O-8024.3.89990.GGTAGC/assembly/logs/prefiltered_contig_coverage_stats.log

Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/alignement_to_prefilter_contigs.sam. Finished job 35. 19 of 46 steps (41%) done

rule filter_by_coverage: input: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta, PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/prefilter_coverage_stats.txt output: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta, PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_discarded_contigs.fasta log: PHH12-O-8024.3.89990.GGTAGC/assembly/logs/filter_by_coverage.log jobid: 24 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC resources: mem=40

filterbycoverage.sh in=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta cov=PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/prefilter_coverage_stats.txt out=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta outd=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_discarded_contigs.fasta minc=5 minp=40 minr=0 minl=2200 trim=100 -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/assembly/logs/filter_by_coverage.log Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.

rule calculate_contigs_stats: input: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta output: PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/prefilter_contig_stats.txt jobid: 4 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, assembly_step=prefilter resources: mem=40

stats.sh in=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_prefilter_contigs.fasta format=3 -Xmx40G > PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/prefilter_contig_stats.txt Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89. Finished job 4. 20 of 46 steps (43%) done Finished job 24. 21 of 46 steps (46%) done

rule calculate_contigs_stats: input: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta output: PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/final_contig_stats.txt jobid: 5 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC, assembly_step=final resources: mem=40

stats.sh in=PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta format=3 -Xmx40G > PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/final_contig_stats.txt Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89.

localrule finalize_contigs: input: PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta output: PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta jobid: 36 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC

cp PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta Finished job 36. 22 of 46 steps (48%) done Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/PHH12-O-8024.3.89990.GGTAGC_final_contigs.fasta. Finished job 5. 23 of 46 steps (50%) done

rule align_reads_to_final_contigs: input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta output: PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam, PHH12-O-8024.3.89990.GGTAGC/assembly/unmapped_post_filter/PHH12-O-8024.3.89990.GGTAGC_unmapped_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/unmapped_post_filter/PHH12-O-8024.3.89990.GGTAGC_unmapped_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/unmapped_post_filter/PHH12-O-8024.3.89990.GGTAGC_unmapped_se.fastq.gz log: PHH12-O-8024.3.89990.GGTAGC/assembly/logs/contig_coverage_stats.log jobid: 38 benchmark: logs/benchmarks/align_reads_to_filtered_contigs/PHH12-O-8024.3.89990.GGTAGC.txt wildcards: sample=PHH12-O-8024.3.89990.GGTAGC threads: 24 resources: mem=40

bbwrap.sh nodisk=t ref=PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta in1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz,PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz,null trimreaddescriptions=t outm=PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam outu1=PHH12-O-8024.3.89990.GGTAGC/assembly/unmapped_post_filter/PHH12-O-8024.3.89990.GGTAGC_unmapped_R1.fastq.gz,PHH12-O-8024.3.89990.GGTAGC/assembly/unmapped_post_filter/PHH12-O-8024.3.89990.GGTAGC_unmapped_se.fastq.gz outu2=PHH12-O-8024.3.89990.GGTAGC/assembly/unmapped_post_filter/PHH12-O-8024.3.89990.GGTAGC_unmapped_R2.fastq.gz,null threads=24 pairlen=1000 pairedonly=t mdtag=t xstag=fs nmtag=t sam=1.3 local=t ambiguous=best secondary=t ssao=t maxsites=10 -Xmx40G 2> PHH12-O-8024.3.89990.GGTAGC/assembly/logs/contig_coverage_stats.log

Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89. Finished job 38. 24 of 46 steps (52%) done

rule pileup: input: PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta, PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam output: PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_base_coverage.txt.gz, PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_histogram.txt, PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_stats.txt, PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_binned.txt log: PHH12-O-8024.3.89990.GGTAGC/assembly/logs/contig_coverage_stats.log jobid: 42 benchmark: logs/benchmarks/align_reads_to_filtered_contigs/PHH12-O-8024.3.89990.GGTAGC_pileup.txt wildcards: sample=PHH12-O-8024.3.89990.GGTAGC threads: 24 resources: mem=40

pileup.sh ref=PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta in=PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam threads=24 -Xmx40G covstats=PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_stats.txt hist=PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_histogram.txt basecov=PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_base_coverage.txt.gz concise=t physcov=t secondary=f bincov=PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_binned.txt 2>> PHH12-O-8024.3.89990.GGTAGC/assembly/logs/contig_coverage_stats.log Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_base_coverage.txt.gz. Finished job 42. 25 of 46 steps (54%) done

rule convert_sam_to_bam: input: PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam output: PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.bam jobid: 27 wildcards: file=PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC threads: 24

samtools view -@ 24 -bSh1 PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam | samtools sort -m 1536M -@ 24 -T /tmp/PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC_tmp -o PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.bam -O bam - Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.sam. Finished job 27. 26 of 46 steps (57%) done

rule run_prokka_annotation: input: PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta output: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.err, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.faa, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.ffn, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.fna, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.fsa, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gbk, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gff, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.log, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.sqn, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.tbl, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.tsv, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.txt jobid: 25 benchmark: logs/benchmarks/prokka/PHH12-O-8024.3.89990.GGTAGC.txt wildcards: sample=PHH12-O-8024.3.89990.GGTAGC threads: 24

prokka --outdir PHH12-O-8024.3.89990.GGTAGC/annotation/prokka --force --prefix PHH12-O-8024.3.89990.GGTAGC --locustag PHH12-O-8024.3.89990.GGTAGC --kingdom Bacteria --metagenome --cpus 24 PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89. Finished job 25. 27 of 46 steps (59%) done

rule calculate_insert_size: input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_insert_size_hist.txt, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_length_hist.txt log: PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_calculate_insert_size.log jobid: 18 benchmark: logs/benchmarks/merge_pairs/PHH12-O-8024.3.89990.GGTAGC_insert_size.txt wildcards: sample=PHH12-O-8024.3.89990.GGTAGC threads: 24 resources: mem=40

        bbmerge.sh -Xmx40G threads=24                 in1=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz                 loose ecct k=62                 extend2=50                 ihist=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_insert_size_hist.txt merge=f                 mininsert0=35 minoverlap0=8                 prealloc=t prefilter=t                 minprob=0.8 2> >(tee PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_calculate_insert_size.log)

        readlength.sh in=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz in2=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz out=PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_length_hist.txt 2> >(tee PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_calculate_insert_size.log)

Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89. Finished job 18. 28 of 46 steps (61%) done

rule convert_gff_to_gtf: input: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gff output: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gtf jobid: 28 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC

localrule combine_insert_stats: input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_insert_size_hist.txt output: stats/insert_stats.tsv jobid: 21

localrule combine_read_length_stats: input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_length_hist.txt output: stats/read_length_stats.tsv jobid: 22

localrule finalize_QC: input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R1.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_R2.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_QC_se.fastq.gz, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/PHH12-O-8024.3.89990.GGTAGC_decontamination_reference_stats.txt, PHH12-O-8024.3.89990.GGTAGC/logs/PHH12-O-8024.3.89990.GGTAGC_quality_filtering_stats.txt, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/raw.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/deduplicated.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/filtered.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/clean.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC.zip, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/raw_read_counts.tsv, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/deduplicated_read_counts.tsv, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/filtered_read_counts.tsv, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/clean_read_counts.tsv, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_counts.tsv, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_length_hist.txt output: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/finished_QC, PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/read_counts.tsv jobid: 2 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC

rule update_prokka_tsv: input: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gff output: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC_plus.tsv jobid: 6 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC

rule make_maxbin_abundance_file: input: PHH12-O-8024.3.89990.GGTAGC/assembly/contig_stats/postfilter_coverage_stats.txt output: PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC_contig_coverage.tsv jobid: 39 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC atlas gff2tsv PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gff PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC_plus.tsv

Finished job 28. 29 of 46 steps (63%) done Finished job 39. 30 of 46 steps (65%) done Finished job 21. 31 of 46 steps (67%) done Finished job 22. 32 of 46 steps (70%) done Touching output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/finished_QC. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/raw_read_counts.tsv. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/clean_read_counts.tsv. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/filtered_read_counts.tsv. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/QC_read_counts.tsv. Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/deduplicated_read_counts.tsv. Finished job 2. 33 of 46 steps (72%) done

localrule combine_read_counts: input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/read_stats/read_counts.tsv output: stats/read_counts.tsv jobid: 20

Finished job 6. 34 of 46 steps (74%) done Finished job 20. 35 of 46 steps (76%) done

rule run_diamond_blastp: input: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.faa, /home/william/dbs/atlas_db.v2/refseq.dmnd output: PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits.tsv jobid: 41 benchmark: logs/benchmarks/run_diamond_blastp/PHH12-O-8024.3.89990.GGTAGC.txt wildcards: sample=PHH12-O-8024.3.89990.GGTAGC threads: 24

diamond blastp --threads 24 --outfmt 6 --out PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits.tsv --query PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.faa --db /home/william/dbs/atlas_db.v2/refseq.dmnd --top 2 --evalue 1e-06 --id 50 --query-cover 50 --gapopen 11 --gapextend 1 --tmpdir /tmp --block-size 2 --index-chunks 4 Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89. Finished job 41. 36 of 46 steps (78%) done

rule add_contig_metadata: input: PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits.tsv, PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gff output: PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus.tsv jobid: 37 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC

localrule QC_report: input: PHH12-O-8024.3.89990.GGTAGC/sequence_quality_control/finished_QC, stats/read_counts.tsv, stats/insert_stats.tsv, stats/read_length_stats.tsv output: finished_QC jobid: 3 atlas munge-blast PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits.tsv PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gff PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus.tsv

    if [ -d ref ]; then
        rm -r ref
    fi

Touching output file finished_QC. Finished job 3. 37 of 46 steps (80%) done Finished job 37. 38 of 46 steps (83%) done

rule sort_munged_blast_hits: input: PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus.tsv output: PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus_sorted.tsv jobid: 26 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC

sort -k1,1 -k2,2 -k13,13rn PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus.tsv > PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus_sorted.tsv Removing temporary output file PHH12-O-8024.3.89990.GGTAGC/annotation/refseq/PHH12-O-8024.3.89990.GGTAGC_hits_plus.tsv. Finished job 26. 39 of 46 steps (85%) done

rule run_maxbin: input: PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta, PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC_contig_coverage.tsv output: PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC.summary, PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC.marker log: PHH12-O-8024.3.89990.GGTAGC/logs/maxbin2.log jobid: 30 benchmark: logs/benchmarks/maxbin2/PHH12-O-8024.3.89990.GGTAGC.txt wildcards: sample=PHH12-O-8024.3.89990.GGTAGC threads: 24

run_MaxBin.pl -contig PHH12-O-8024.3.89990.GGTAGC/PHH12-O-8024.3.89990.GGTAGC_contigs.fasta -abund PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC_contig_coverage.tsv -out PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC -min_contig_length 200 -thread 24 -prob_threshold 0.9 -max_iteration 50 > PHH12-O-8024.3.89990.GGTAGC/logs/maxbin2.log Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/9596cb25. Finished job 30. 40 of 46 steps (87%) done

rule run_checkm_lineage_wf: input: logs/checkm_init.txt, PHH12-O-8024.3.89990.GGTAGC/genomic_bins/PHH12-O-8024.3.89990.GGTAGC.marker output: PHH12-O-8024.3.89990.GGTAGC/genomic_bins/checkm/completeness.tsv jobid: 10 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC threads: 24

rm -r PHH12-O-8024.3.89990.GGTAGC/genomic_bins/checkm && checkm lineage_wf --file PHH12-O-8024.3.89990.GGTAGC/genomic_bins/checkm/completeness.tsv --tab_table --quiet --extension fasta --threads 24 PHH12-O-8024.3.89990.GGTAGC/genomic_bins PHH12-O-8024.3.89990.GGTAGC/genomic_bins/checkm Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/9596cb25. Finished job 10. 41 of 46 steps (89%) done

rule find_counts_per_region: input: PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gtf, PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.bam output: PHH12-O-8024.3.89990.GGTAGC/annotation/feature_counts/PHH12-O-8024.3.89990.GGTAGC_counts.txt.summary, PHH12-O-8024.3.89990.GGTAGC/annotation/feature_counts/PHH12-O-8024.3.89990.GGTAGC_counts.txt log: PHH12-O-8024.3.89990.GGTAGC/logs/counts_per_region.log jobid: 9 wildcards: sample=PHH12-O-8024.3.89990.GGTAGC threads: 24

featureCounts -p −−minOverlap 1 -B -F gtf -T 24 --primary -O --fraction -t CDS -g ID -a PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gtf -o PHH12-O-8024.3.89990.GGTAGC/annotation/feature_counts/PHH12-O-8024.3.89990.GGTAGC_counts.txt PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.bam 2> PHH12-O-8024.3.89990.GGTAGC/logs/counts_per_region.log Activating conda environment /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89. Error in rule find_counts_per_region: jobid: 9 output: PHH12-O-8024.3.89990.GGTAGC/annotation/feature_counts/PHH12-O-8024.3.89990.GGTAGC_counts.txt.summary, PHH12-O-8024.3.89990.GGTAGC/annotation/feature_counts/PHH12-O-8024.3.89990.GGTAGC_counts.txt log: PHH12-O-8024.3.89990.GGTAGC/logs/counts_per_region.log

RuleException: CalledProcessError in line 683 of /home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/rules/assemble.snakefile: Command 'source activate /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/conda/212a2e89; set -euo pipefail; featureCounts -p −−minOverlap 1 -B -F gtf -T 24 --primary -O --fraction -t CDS -g ID -a PHH12-O-8024.3.89990.GGTAGC/annotation/prokka/PHH12-O-8024.3.89990.GGTAGC.gtf -o PHH12-O-8024.3.89990.GGTAGC/annotation/feature_counts/PHH12-O-8024.3.89990.GGTAGC_counts.txt PHH12-O-8024.3.89990.GGTAGC/sequence_alignment/PHH12-O-8024.3.89990.GGTAGC.bam 2> PHH12-O-8024.3.89990.GGTAGC/logs/counts_per_region.log ' returned non-zero exit status 255. File "/home/william/miniconda3/envs/atlas.72/lib/python3.6/site-packages/pnnl_atlas-1.0.19-py3.6.egg/atlas/rules/assemble.snakefile", line 683, in __rule_find_counts_per_region File "/home/william/miniconda3/envs/atlas.72/lib/python3.6/concurrent/futures/thread.py", line 56, in run Will exit after finishing currently running jobs. Exiting because a job execution failed. Look above for error message Complete log: /media/ExtHD3/HD3/CSP-mg-organic-oct2011-analysis/assembly_atlas/.snakemake/log/2018-01-10T094504.256226.snakemake.log

SilasK commented 6 years ago

This seem to be solved in the last commit to the master branch 3759d86 . Thank you @wrodriguezz for testing.

brwnj commented 6 years ago

Thanks for complete bug report! You can update to latest via pypi (pip install -U pnnl-atlas).

ghost commented 6 years ago

I totally missed the latest commits. Thanks @SilasK and @brwnj