shandley / hecatomb

hecatomb is a virome analysis pipeline for analysis of Illumina sequence data
MIT License
54 stars 12 forks source link

MANY missing output files; Input files updated by another job #65

Closed mihinduk closed 2 years ago

mihinduk commented 2 years ago

Hi Mike, After Hecatomb crashed, I ran this: hecatomb run --reads RC2_freeze_2_samples_C.tsv --profile slurm --configfile heca tomb.config.yaml --snake=-n --snake=--reason

                                                                                                                                       [142/1903]

[Thu Feb 17 08:52:52 2022] rule secondary_nt_lca_table: input: hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/all.m8 output: hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/all.lin log: hecatomb_out/STDERR/secondary_nt_lca_table.log jobid: 2034 benchmark: hecatomb_out/BENCHMARKS/secondary_nt_lca_table.txt reason: Missing output files: hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/all.lin resources: mem_mb=16000, disk_mb=893298, tmpdir=/tmp, time=1440, jobs=100

[Thu Feb 17 08:52:52 2022] rule secondary_nt_calc_lca: input: hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/all.lin, /opt/apps/labs/sahlab/software/miniconda3/envs/hecatomb/snakemake/workflow/../.. /databases/tax/taxonomy output: hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/lca.lineage, hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/secondary_nt_lca.tsv log: hecatomb_out/STDERR/secondary_nt_calc_lca.log jobid: 2033 benchmark: hecatomb_out/BENCHMARKS/secondary_nt_calc_lca.txt reason: Missing output files: hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/secondary_nt_lca.tsv; Input files updated by another job: hecatomb _out/RESULTS/MMSEQS_NT_SECONDARY/results/all.lin threads: 24 resources: mem_mb=64000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

    {
    # calculate lca and lineage
    taxonkit lca -i 2 -s ';' --data-dir /opt/apps/labs/sahlab/software/miniconda3/envs/hecatomb/snakemake/workflow/../../databases/tax/taxonomy h

ecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/all.lin | taxonkit lineage -i 3 --data-dir /opt/apps/labs/sahlab/software/miniconda3/envs /hecatomb/snakemake/workflow/../../databases/tax/taxonomy | cut --complement -f 2 > hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/ results/lca.lineage 2> hecatomb_out/STDERR/secondary_nt_calc_lca.log

    # Reformat lineages
    awk -F '        ' '$2 != 0' hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/lca.lineage |             taxonkit reformat --data-dir /opt/apps

/labs/sahlab/software/miniconda3/envs/hecatomb/snakemake/workflow/../../databases/tax/taxonomy -i 3 -f "{k}\t{p}\t{c}\t{o}\t{f}\t{g}\ t{s}" -F --fill-miss-rank 2>> hecatomb_out/STDERR/secondary_nt_calc_lca.log | cut --complement -f3 > hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/secondary_nt_lca.tsv } &> hecatomb_out/STDERR/secondary_nt_calc_lca.log rm hecatomb_out/STDERR/secondary_nt_calc_lca.log

[Thu Feb 17 08:52:52 2022] rule SECONDARY_NT_generate_output_table: input: hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/tophit.m8, hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/SECONDARY_nt.tsv, hecatomb_out/RESULT S/MMSEQS_NT_SECONDARY/results/secondary_nt_lca.tsv, hecatomb_out/RESULTS/sampleSeqCounts.tsv, /opt/apps/labs/sahlab/software/miniconda3/envs/hecatomb /snakemake/workflow/../../databases/tables/2020_07_27_Viral_classification_table_ICTV2019.txt output: hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/NT_bigtable.tsv log: hecatomb_out/STDERR/SECONDARY_NT_generate_output_table.log jobid: 2026 benchmark: hecatomb_out/BENCHMARKS/SECONDARY_NT_generate_output_table.txt reason: Missing output files: hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/NT_bigtable.tsv; Input files updated by another job: hecatomb_out/RESULTS/ MMSEQS_NT_SECONDARY/results/secondary_nt_lca.tsv resources: mem_mb=2000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

[Thu Feb 17 08:52:52 2022] rule combine_AA_NT: input: hecatomb_out/RESULTS/MMSEQS_AA_SECONDARY/AA_bigtable.tsv, hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/NT_bigtable.tsv output: hecatomb_out/RESULTS/bigtable.tsv log: hecatomb_out/STDERR/combine_AA_NT.log jobid: 2036 benchmark: hecatomb_out/BENCHMARKS/combine_AA_NT.txt reason: Missing output files: hecatomb_out/RESULTS/bigtable.tsv; Input files updated by another job: hecatomb_out/RESULTS/MMSEQS_NTSECONDARY/NT bigtable.tsv resources: mem_mb=2000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

    { cat hecatomb_out/RESULTS/MMSEQS_AA_SECONDARY/AA_bigtable.tsv > hecatomb_out/RESULTS/bigtable.tsv;
    tail -n+2 hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/NT_bigtable.tsv >> hecatomb_out/RESULTS/bigtable.tsv; } &> hecatomb_out/STDERR/combine_AA_

NT.log rm hecatomb_out/STDERR/combine_AA_NT.log

[Thu Feb 17 08:52:52 2022] rule tax_level_counts: input: hecatomb_out/RESULTS/bigtable.tsv output: hecatomb_report/taxonLevelCounts.tsv log: hecatomb_out/STDERR/tax_level_counts.log jobid: 2045 reason: Missing output files: hecatomb_report/taxonLevelCounts.tsv; Input files updated by another job: hecatomb_out/RESULTS/bigtable.tsv threads: 2 resources: mem_mb=16000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

[Thu Feb 17 08:52:52 2022] rule contig_read_taxonomy: input: hecatomb_out/PROCESSING/MAPPING/assembly.seqtable.bam, hecatomb_out/PROCESSING/MAPPING/assembly.seqtable.bam.bai, hecatomb_out/RESULTS/big table.tsv output: hecatomb_out/RESULTS/contigSeqTable.tsv log: hecatomb_out/STDERR/contig_read_taxonomy.log jobid: 2041 benchmark: hecatomb_out/BENCHMARKS/contig_read_taxonomy.txt reason: Missing output files: hecatomb_out/RESULTS/contigSeqTable.tsv; Input files updated by another job: hecatomb_out/RESULTS/bigtable.tsv threads: 2 resources: mem_mb=16000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

[Thu Feb 17 08:52:52 2022] rule krona_text_format: input: hecatomb_out/RESULTS/bigtable.tsv output: hecatomb_report/krona.txt log: hecatomb_out/STDERR/krona_text_format.log jobid: 2047 benchmark: hecatomb_out/BENCHMARKS/krona_text_format.txt reason: Missing output files: hecatomb_report/krona.txt; Input files updated by another job: hecatomb_out/RESULTS/bigtable.tsv resources: mem_mb=2000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

[Thu Feb 17 08:52:52 2022] rule contig_krona_text_format: input: hecatomb_out/RESULTS/contigSeqTable.tsv output: hecatomb_report/contigKrona.txt log: hecatomb_out/STDERR/contig_krona_text_format.log jobid: 2043 reason: Missing output files: hecatomb_report/contigKrona.txt; Input files updated by another job: hecatomb_out/RESULTS/contigSeqTable.tsv resources: mem_mb=2000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

[Thu Feb 17 08:52:52 2022] rule krona_plot: input: hecatomb_report/krona.txt output: hecatomb_report/krona.html log: hecatomb_out/STDERR/krona_plot.log jobid: 2046 benchmark: hecatomb_out/BENCHMARKS/krona_plot.txt reason: Missing output files: hecatomb_report/krona.html; Input files updated by another job: hecatomb_report/krona.txt resources: mem_mb=2000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

    ktImportText hecatomb_report/krona.txt -o hecatomb_report/krona.html &> hecatomb_out/STDERR/krona_plot.log
    rm hecatomb_out/STDERR/krona_plot.log

[Thu Feb 17 08:52:52 2022] rule contig_krona_plot: input: hecatomb_report/contigKrona.txt output: hecatomb_report/contigKrona.html log: hecatomb_out/STDERR/contig_krona_plot.log jobid: 2042 reason: Missing output files: hecatomb_report/contigKrona.html; Input files updated by another job: hecatomb_report/contigKrona.txt resources: mem_mb=2000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

    ktImportText hecatomb_report/contigKrona.txt -o hecatomb_report/contigKrona.html &> hecatomb_out/STDERR/contig_krona_plot.log
    rm hecatomb_out/STDERR/contig_krona_plot.log

[Thu Feb 17 08:52:52 2022] localrule all: input: hecatomb_out/RESULTS/seqtable.fasta, hecatomb_out/RESULTS/sampleSeqCounts.tsv, hecatomb_out/RESULTS/seqtable.properties.tsv, hecatomb_out/ PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/FLYE/assembly.fasta, hecatomb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/MAPPING/contig_count_table.tsv, hecatom b_out/RESULTS/assembly.properties.tsv, hecatomb_out/RESULTS/MMSEQS_AA_SECONDARY/AA_bigtable.tsv, hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/NT_bigtable .tsv, hecatomb_out/RESULTS/bigtable.tsv, hecatomb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/FLYE/SECONDARY_nt.tsv, hecatomb_out/PROCESSING/ASSEMBLY/C ONTIG_DICTIONARY/FLYE/SECONDARY_nt_phylum_summary.tsv, hecatomb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/FLYE/SECONDARY_nt_class_summary.tsv, hecato mb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/FLYE/SECONDARY_nt_order_summary.tsv, hecatomb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/FLYE/SECONDARY_n t_family_summary.tsv, hecatomb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/FLYE/SECONDARY_nt_genus_summary.tsv, hecatomb_out/PROCESSING/ASSEMBLY/CONTIG _DICTIONARY/FLYE/SECONDARY_nt_species_summary.tsv, hecatomb_out/PROCESSING/MAPPING/assembly.seqtable.bam, hecatomb_out/PROCESSING/MAPPING/assembly.se qtable.bam.bai, hecatomb_out/RESULTS/contigSeqTable.tsv, hecatomb_report/contigKrona.html, hecatomb_report/Step00_counts.tsv, hecatombreport/Step01 counts.tsv, hecatomb_report/Step02_counts.tsv, hecatomb_report/Step03_counts.tsv, hecatomb_report/Step04_counts.tsv, hecatomb_report/Step05_counts.ts v, hecatomb_report/Step06_counts.tsv, hecatomb_report/Step07_counts.tsv, hecatomb_report/Step08_counts.tsv, hecatomb_report/Step09_counts.tsv, hecato mb_report/Step10_counts.tsv, hecatomb_report/Step11_counts.tsv, hecatomb_report/Step12_counts.tsv, hecatomb_report/Step13_counts.tsv, hecatomb_report /Sankey.svg, hecatomb_report/hecatomb.samples.tsv, hecatomb_report/taxonLevelCounts.tsv, hecatomb_report/krona.html jobid: 0 reason: Input files updated by another job: hecatomb_out/RESULTS/bigtable.tsv, hecatomb_report/contigKrona.html, hecatomb_report/krona.html, heca tomb_report/taxonLevelCounts.tsv, hecatomb_out/RESULTS/contigSeqTable.tsv, hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/NT_bigtable.tsv resources: mem_mb=2000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

Job stats: job count min threads max threads


SECONDARY_NT_generate_output_table 1 1 1 all 1 1 1 combine_AA_NT 1 1 1 contig_krona_plot 1 1 1 contig_krona_text_format 1 1 1 contig_read_taxonomy 1 2 2 krona_plot 1 1 1 krona_text_format 1 1 1 secondary_nt_calc_lca 1 24 24 secondary_nt_lca_table 1 1 1 tax_level_counts 1 2 2 total 11 1 24

This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

What the hecatomb?

beardymcjohnface commented 2 years ago

I'm not sure what the problem is. when you run snakemake with --reason it will print an explanation for why it's running each rule. The 'missing output files' and 'input updated by another job' are the reasons why snakemake is planning on running those rules.