Open pengouy opened 9 months ago
Hi, I'll be releasing 1.3.1 soon which should have all the kinks worked out. Unfortunately snakemake v8 broke some things, and python 3.12 broke f-strings in snakemake. the cluster commands for snakemake 8+ have also changed, so I've pinned all my tools to snakemake <8 for now and will migrate them all together at a later date.
The unit tests for Hecatomb don't quite cover everything yet so some bugs slipped through the cracks. The next version is waiting on review for koverage 0.1.10 in bioconda https://github.com/bioconda/bioconda-recipes/pull/45597 and I'll push the release as soon as that is done.
If you need it today, pull and install hecatomb from source:
conda create -n hecatombDev python=3.11
conda activate hecatombDev
git clone https://github.com/shandley/hecatomb.git
cd hecatomb
git checkout dev
pip install -e .
Modify the koverage yaml to use koverage 0.1.9 and snakemake<8:
nano hecatomb/snakemake/workflow/envs/koverage.yaml
name: koverage
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- koverage=0.1.9
- snakemake<8
Install DBs and envs
hecatomb install
hecatomb test build_envs
It should then work
hecatomb test
Thank you so much for the quick response, it helps a lot. I will try the hecatombDev. Many thanks for the effort.
All good, let me know how it goes.
All good, let me know how it goes.
Hi, the job has not finished yet with the newly released version 1.3.1, but it works well untill now. I'm using Hecatomb on a supercomputer platform containing multiple nodes. I submitted the job to two nodes, however, only one node has been used. Considering that the step of mmseq alignment costs a lot of time, I'm wondering whether the mmseq supports to run with two or more nodes to speed it up? It may be an optional choice for a big data job in the future updating.
Hecatomb's HPC support is via snakemake profiles. You can submit the main hecatomb job with 1 thread and pass your profile to the hecatomb command. The main job will submit individual jobs to the queue for you. You could also just submit to one node with lots of resources and run as a local job.
I just found a new bug when using --profile so I'll push another version soon.
Thanks for the explaination, I have noticed that Hecatomb would select itself to run multiple jobs. I checked the result file "contigAnnotations.tsv" and found that there was an error during the seperation of the colume "target" to classification name like this:
The "\
" had not been correctly replaced.
And I have another doubt that when I use the contig sequences in "_mergedassembly.fasta" to BLAST in NCBI whose contigID is clutered into viruses in "contigAnnotations.tsv" file, the BLAST results almost could not match to the "contigAnnotations.tsv". Isn't there a correspondence among these two files? About 10 days ago, I met this question when I run the same data using version 1.2.0, I thought it was an accident, but met same quesion again, It really confused me.
Thanks, I'll look into it.
Hi, sorry to bother you again that the job ended just now without no error report, but yeilded a "bigtable.tsv" sized only 1Kb, I checked the log directory and found "_secondary_nt_calclca.log" file sized more than 3Gb, it looks like the resuls have not been successfully merged. Could you please check whether there is a bug?
Here is the relative log detail:
[Tue Feb 6 18:56:40 2024]
rule combine_aa_nt:
input: hecatomb.out/processing/mmseqs_aa_secondary/AA_bigtable.tsv, hecatomb.out/processing/mmseqs_nt_secondary/NT_bigtable.tsv
output: hecatomb.out/results/bigtable.tsv
log: hecatomb.out/logs/combine_AA_NT.log
jobid: 77
benchmark: hecatomb.out/benchmarks/combine_AA_NT.txt
reason: Missing output files: hecatomb.out/results/bigtable.tsv; Input files updated by another job: hecatomb.out/processing/mmseqs_nt_secondary/NT_bigtable.tsv, hecatomb.out/processing/mmseqs_aa_secondary/AA_bigtable.tsv
resources: tmpdir=/tmp, time=01:00:00, mem_mb=16000, mem_mib=15259, mem=16000MB
{ cat hecatomb.out/processing/mmseqs_aa_secondary/AA_bigtable.tsv > hecatomb.out/results/bigtable.tsv; tail -n+2 hecatomb.out/processing/mmseqs_nt_secondary/NT_bigtable.tsv >> hecatomb.out/results/bigtable.tsv; } &> hecatomb.out/logs/combine_AA_NT.log;
[Tue Feb 6 18:56:40 2024]
Finished job 77.
81 of 89 steps (91%) done
Select jobs to execute...
And when I load the "contigSeqTable.tsv" file, I found all classification of contigs into taxon levels remains NA.
If your bigtable is empty then the contigSeqTable will be all NA as it joins the seq annotations with the contigs. I think I've fixed, it was caused the formatting issues with the taxonkit command. Can you confirm that both hecatomb.out/processing/mmseqs_aa_secondary/AA_bigtable.tsv
and hecatomb.out/results/bigtable.tsv
are tiny files?
Oh no! I have deleted the whole hecatomb.out
directory yesterday, but I am sure hecatomb.out/results/bigtable.tsv
is tiny file
oh that's fine, i'm pretty sure i've worked out the issues. I'm just waiting on new releases for koverage and trimnami before i can push the next version of hecatomb.
Appreciate your efforts, looking forward to it.
Hi, I am a little bit confused about the result of the file merged_assembly.fasta
, the NCBI BLAST results of contigs in this file do not always match the taxon classification of contigAnnotations.tsv
. And I have also made alignment between contigs and sequences fatched according to the NCBI accession number in column 'target' of contigAnnotations.tsv
, they do not match either.
Hi, I updated the Hecatomb to the newest version 1.3.0 the day you released it. Unfortunately, it seems that there are some bugs when I run with the command
hecatomb test
, and I have noticed that you are working on it day and night. I really need this extraordinary tool now, but I can't install the version 1.2.0, could I ask when the bug-fixed version 1.3.1 will be released? Looking forward to your response, thanks for your time. The following is log:Activating conda environment: anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/conda/82f1c97d51f13e73842c70c6a19c5768 /usr/bin/bash: -c: line 0: syntax error near unexpected token
;' /usr/bin/bash: -c: line 0:
source /public3/home/sc30177/anaconda3/bin/activate '/public3/home/sc30177/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/conda/82f1c97d51f13e73842c70c6a19c5768'; set -euo pipefail; if [[ -d hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG ]]; then; rm -rf hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG; fi; megahit -1 hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_R1.host_rm.fastq.gz -2 hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_R2.host_rm.fastq.gz -r hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_RS.host_rm.fastq.gz -o hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG --out-prefix A13-256-115-06_GTTTCG -t 16 --presets meta-large&> hecatomb.out/logs/megahit_sample_paired.A13-256-115-06_GTTTCG.log; sed 's/>/>A13-256-115-06_GTTTCG/' hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG/A13-256-115-06_GTTTCG.contigs.fa > hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG/A13-256-115-06_GTTTCG.rename.contigs.fa; tar cf - hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG | zstd -T16 -9 > hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG.tar.zst 2> hecatomb.out/logs/megahit_sample_paired.A13-256-115-06_GTTTCG.log;' [Sun Feb 4 09:22:40 2024] Error in rule megahit_sample_paired: jobid: 13 input: hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_R1.host_rm.fastq.gz, hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_R2.host_rm.fastq.gz, hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_RS.host_rm.fastq.gz output: hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG/A13-256-115-06_GTTTCG.contigs.fa, hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG/A13-256-115-06_GTTTCG.rename.contigs.fa, hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG.tar.zst log: hecatomb.out/logs/megahit_sample_paired.A13-256-115-06GTTTCG.log (check log file(s) for error details) conda-env: /public3/home/sc30177/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/conda/82f1c97d51f13e73842c70c6a19c5768 shell: if [[ -d hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG ]]; then; rm -rf hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG; fi; megahit -1 hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_R1.host_rm.fastq.gz -2 hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_R2.host_rm.fastq.gz -r hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_RS.host_rm.fastq.gz -o hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG --out-prefix A13-256-115-06_GTTTCG -t 16 --presets meta-large&> hecatomb.out/logs/megahit_sample_paired.A13-256-115-06_GTTTCG.log; sed 's/>/>A13-256-115-06_GTTTCG/' hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG/A13-256-115-06_GTTTCG.contigs.fa > hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG/A13-256-115-06_GTTTCG.rename.contigs.fa; tar cf - hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG | zstd -T16 -9 > hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG.tar.zst 2> hecatomb.out/logs/megahit_sample_paired.A13-256-115-06_GTTTCG.log; (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) Logfile hecatomb.out/logs/megahit_sample_paired.A13-256-115-06_GTTTCG.log not found.Config file /public3/home/sc30177/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../config/config.yaml is extended by additional config specified via the command line. Config file /public3/home/sc30177/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../config/dbFiles.yaml is extended by additional config specified via the command line. Config file /public3/home/sc30177/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../config/immutable.yaml is extended by additional config specified via the command line. Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 32 Rules claiming more threads will be scaled down. Select jobs to execute... [Sun Feb 4 09:22:43 2024] Finished job 51. 2 of 99 steps (2%) done
Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message cat .snakemake/log/2024-02-04T072114.582588.snakemake.log >> hecatomb.out/hecatomb.log FATAL: Hecatomb encountered an error. Check the Hecatomb logs directory for command-related errors: hecatomb.out/logs Complete log: .snakemake/log/2024-02-04T072114.582588.snakemake.log [2024:02:04 09:22:43] ERROR: Snakemake failed