shandley / hecatomb

hecatomb is a virome analysis pipeline for analysis of Illumina sequence data
MIT License
56 stars 12 forks source link

Errors in newest version 1.3.0 #104

Open pengouy opened 9 months ago

pengouy commented 9 months ago

Hi, I updated the Hecatomb to the newest version 1.3.0 the day you released it. Unfortunately, it seems that there are some bugs when I run with the command hecatomb test, and I have noticed that you are working on it day and night. I really need this extraordinary tool now, but I can't install the version 1.2.0, could I ask when the bug-fixed version 1.3.1 will be released? Looking forward to your response, thanks for your time. The following is log:


Activating conda environment: anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/conda/82f1c97d51f13e73842c70c6a19c5768 /usr/bin/bash: -c: line 0: syntax error near unexpected token ;' /usr/bin/bash: -c: line 0:source /public3/home/sc30177/anaconda3/bin/activate '/public3/home/sc30177/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/conda/82f1c97d51f13e73842c70c6a19c5768'; set -euo pipefail; if [[ -d hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG ]]; then; rm -rf hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG; fi; megahit -1 hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_R1.host_rm.fastq.gz -2 hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_R2.host_rm.fastq.gz -r hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_RS.host_rm.fastq.gz -o hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG --out-prefix A13-256-115-06_GTTTCG -t 16 --presets meta-large&> hecatomb.out/logs/megahit_sample_paired.A13-256-115-06_GTTTCG.log; sed 's/>/>A13-256-115-06_GTTTCG/' hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG/A13-256-115-06_GTTTCG.contigs.fa > hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG/A13-256-115-06_GTTTCG.rename.contigs.fa; tar cf - hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG | zstd -T16 -9 > hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG.tar.zst 2> hecatomb.out/logs/megahit_sample_paired.A13-256-115-06_GTTTCG.log;' [Sun Feb 4 09:22:40 2024] Error in rule megahit_sample_paired: jobid: 13 input: hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_R1.host_rm.fastq.gz, hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_R2.host_rm.fastq.gz, hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_RS.host_rm.fastq.gz output: hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG/A13-256-115-06_GTTTCG.contigs.fa, hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG/A13-256-115-06_GTTTCG.rename.contigs.fa, hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG.tar.zst log: hecatomb.out/logs/megahit_sample_paired.A13-256-115-06GTTTCG.log (check log file(s) for error details) conda-env: /public3/home/sc30177/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/conda/82f1c97d51f13e73842c70c6a19c5768 shell: if [[ -d hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG ]]; then; rm -rf hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG; fi; megahit -1 hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_R1.host_rm.fastq.gz -2 hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_R2.host_rm.fastq.gz -r hecatomb.out/trimnami/results/fastp/A13-256-115-06_GTTTCG_RS.host_rm.fastq.gz -o hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG --out-prefix A13-256-115-06_GTTTCG -t 16 --presets meta-large&> hecatomb.out/logs/megahit_sample_paired.A13-256-115-06_GTTTCG.log; sed 's/>/>A13-256-115-06_GTTTCG/' hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG/A13-256-115-06_GTTTCG.contigs.fa > hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG/A13-256-115-06_GTTTCG.rename.contigs.fa; tar cf - hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG | zstd -T16 -9 > hecatomb.out/processing/assembly/A13-256-115-06_GTTTCG.tar.zst 2> hecatomb.out/logs/megahit_sample_paired.A13-256-115-06_GTTTCG.log; (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) Logfile hecatomb.out/logs/megahit_sample_paired.A13-256-115-06_GTTTCG.log not found.

Config file /public3/home/sc30177/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../config/config.yaml is extended by additional config specified via the command line. Config file /public3/home/sc30177/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../config/dbFiles.yaml is extended by additional config specified via the command line. Config file /public3/home/sc30177/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../config/immutable.yaml is extended by additional config specified via the command line. Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 32 Rules claiming more threads will be scaled down. Select jobs to execute... [Sun Feb 4 09:22:43 2024] Finished job 51. 2 of 99 steps (2%) done

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message cat .snakemake/log/2024-02-04T072114.582588.snakemake.log >> hecatomb.out/hecatomb.log FATAL: Hecatomb encountered an error. Check the Hecatomb logs directory for command-related errors: hecatomb.out/logs Complete log: .snakemake/log/2024-02-04T072114.582588.snakemake.log [2024:02:04 09:22:43] ERROR: Snakemake failed

beardymcjohnface commented 9 months ago

Hi, I'll be releasing 1.3.1 soon which should have all the kinks worked out. Unfortunately snakemake v8 broke some things, and python 3.12 broke f-strings in snakemake. the cluster commands for snakemake 8+ have also changed, so I've pinned all my tools to snakemake <8 for now and will migrate them all together at a later date.

The unit tests for Hecatomb don't quite cover everything yet so some bugs slipped through the cracks. The next version is waiting on review for koverage 0.1.10 in bioconda https://github.com/bioconda/bioconda-recipes/pull/45597 and I'll push the release as soon as that is done.

If you need it today, pull and install hecatomb from source:

conda create -n hecatombDev python=3.11
conda activate hecatombDev
git clone https://github.com/shandley/hecatomb.git
cd hecatomb
git checkout dev
pip install -e .

Modify the koverage yaml to use koverage 0.1.9 and snakemake<8:

nano hecatomb/snakemake/workflow/envs/koverage.yaml
name: koverage
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
    - koverage=0.1.9
    - snakemake<8

Install DBs and envs

hecatomb install
hecatomb test build_envs

It should then work

hecatomb test
pengouy commented 9 months ago

Thank you so much for the quick response, it helps a lot. I will try the hecatombDev. Many thanks for the effort.

beardymcjohnface commented 9 months ago

All good, let me know how it goes.

pengouy commented 9 months ago

All good, let me know how it goes.

Hi, the job has not finished yet with the newly released version 1.3.1, but it works well untill now. I'm using Hecatomb on a supercomputer platform containing multiple nodes. I submitted the job to two nodes, however, only one node has been used. Considering that the step of mmseq alignment costs a lot of time, I'm wondering whether the mmseq supports to run with two or more nodes to speed it up? It may be an optional choice for a big data job in the future updating.

beardymcjohnface commented 9 months ago

Hecatomb's HPC support is via snakemake profiles. You can submit the main hecatomb job with 1 thread and pass your profile to the hecatomb command. The main job will submit individual jobs to the queue for you. You could also just submit to one node with lots of resources and run as a local job.

I just found a new bug when using --profile so I'll push another version soon.

pengouy commented 9 months ago

Thanks for the explaination, I have noticed that Hecatomb would select itself to run multiple jobs. I checked the result file "contigAnnotations.tsv" and found that there was an error during the seperation of the colume "target" to classification name like this:

  1. kingdom phylum class order family genus species
  2. Uroviricota\ Caudoviricetes\ Caudoviricetes order\ Caudoviricetes family\ Punavirus\ Punavirus P1

The "\" had not been correctly replaced.

And I have another doubt that when I use the contig sequences in "_mergedassembly.fasta" to BLAST in NCBI whose contigID is clutered into viruses in "contigAnnotations.tsv" file, the BLAST results almost could not match to the "contigAnnotations.tsv". Isn't there a correspondence among these two files? About 10 days ago, I met this question when I run the same data using version 1.2.0, I thought it was an accident, but met same quesion again, It really confused me.

beardymcjohnface commented 9 months ago

Thanks, I'll look into it.

pengouy commented 9 months ago

Hi, sorry to bother you again that the job ended just now without no error report, but yeilded a "bigtable.tsv" sized only 1Kb, I checked the log directory and found "_secondary_nt_calclca.log" file sized more than 3Gb, it looks like the resuls have not been successfully merged. Could you please check whether there is a bug?

pengouy commented 9 months ago

Here is the relative log detail:

[Tue Feb  6 18:56:40 2024]
rule combine_aa_nt:
    input: hecatomb.out/processing/mmseqs_aa_secondary/AA_bigtable.tsv, hecatomb.out/processing/mmseqs_nt_secondary/NT_bigtable.tsv
    output: hecatomb.out/results/bigtable.tsv
    log: hecatomb.out/logs/combine_AA_NT.log
    jobid: 77
    benchmark: hecatomb.out/benchmarks/combine_AA_NT.txt
    reason: Missing output files: hecatomb.out/results/bigtable.tsv; Input files updated by another job: hecatomb.out/processing/mmseqs_nt_secondary/NT_bigtable.tsv, hecatomb.out/processing/mmseqs_aa_secondary/AA_bigtable.tsv
    resources: tmpdir=/tmp, time=01:00:00, mem_mb=16000, mem_mib=15259, mem=16000MB

{ cat hecatomb.out/processing/mmseqs_aa_secondary/AA_bigtable.tsv > hecatomb.out/results/bigtable.tsv; tail -n+2 hecatomb.out/processing/mmseqs_nt_secondary/NT_bigtable.tsv >> hecatomb.out/results/bigtable.tsv; } &> hecatomb.out/logs/combine_AA_NT.log; 
[Tue Feb  6 18:56:40 2024]
Finished job 77.
81 of 89 steps (91%) done
Select jobs to execute...
pengouy commented 9 months ago

And when I load the "contigSeqTable.tsv" file, I found all classification of contigs into taxon levels remains NA.

beardymcjohnface commented 9 months ago

If your bigtable is empty then the contigSeqTable will be all NA as it joins the seq annotations with the contigs. I think I've fixed, it was caused the formatting issues with the taxonkit command. Can you confirm that both hecatomb.out/processing/mmseqs_aa_secondary/AA_bigtable.tsv and hecatomb.out/results/bigtable.tsv are tiny files?

pengouy commented 9 months ago

Oh no! I have deleted the whole hecatomb.out directory yesterday, but I am sure hecatomb.out/results/bigtable.tsv is tiny file

beardymcjohnface commented 9 months ago

oh that's fine, i'm pretty sure i've worked out the issues. I'm just waiting on new releases for koverage and trimnami before i can push the next version of hecatomb.

pengouy commented 9 months ago

Appreciate your efforts, looking forward to it.

pengouy commented 9 months ago

Hi, I am a little bit confused about the result of the file merged_assembly.fasta, the NCBI BLAST results of contigs in this file do not always match the taxon classification of contigAnnotations.tsv. And I have also made alignment between contigs and sequences fatched according to the NCBI accession number in column 'target' of contigAnnotations.tsv, they do not match either.