shandley / hecatomb

hecatomb is a virome analysis pipeline for analysis of Illumina sequence data
MIT License
56 stars 12 forks source link

Failed job in processing secondary_nt_calc_lca step #114

Closed pengouy closed 4 months ago

pengouy commented 5 months ago

Hi, I meet with a fetal error recently when using hecatomb v1.3.2, the report is followed as:

02:53:41.387 [WARN] taxid 11103 was merged into 3052230 02:53:41.387 [WARN] taxid 2686064 was merged into 2844220 02:53:41.387 [WARN] taxid 754060 was merged into 1555208 02:53:41.388 [ERRO] bufio.Scanner: token too long 02:53:41.580 [ERRO] lineage-field (4) out of range (3):DPLH240302R9:1:5.393e-05:449009 1655021;1574182;2664395;260024;(long number)

================================================================================

Removing output files of failed job secondary_nt_calc_lca since they might be corrupted: hecatomb.out/processing/mmseqs_nt_secondary/lca_lineage.tsv Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message cat .snakemake/log/2024-07-04T013623.118997.snakemake.log >> hecatomb.out/hecatomb.log

FATAL: Hecatomb encountered an error.

Check the Hecatomb logs directory for command-related errors:

hecatomb.out/logs

Complete log: .snakemake/log/2024-07-04T013623.118997.snakemake.log [2024:07:04 02:54:16] ERROR: Snakemake failed

It is a little bit strange that I run successfully with same version several months ago, but it failed with new data in the past days. Could you please help me to check with the error?

Kind regards, Ouyang

pengouy commented 5 months ago

Detailed log:

Loading NCBI taxonomy Loading nodes file ... Done, got 2321680 nodes Loading merged file ... Done, added 62428 merged nodes. Loading names file ... Done Making matrix ... Done Init RMQ ...Done [================================================================] =44.91M 7s 852ms Taxonomy for 14 entries not found and 0 are deleted Time for merging to result_top1_swapped_sum_tax: 0h 0m 0s 40ms Time for processing: 0h 0m 18s 54ms createtsv /public3/home/sc30177/.conda/envs/hecatomb/lib/python3.11/site-packages/hecatomb/snakemake/workflow/../databases/aa/virus_secondary_aa/sequenceDB hecatomb.out/processing/mmseqs_aa_secondary/tmp/9299886082380050088/result_top1_swapped_sum_tax hecatomb.out/processing/mmseqs_aa_secondary/mmseqs.aa.secondary_tophit_report --first-seq-as-repr 0 --target-column 1 --full-header 0 --idx-seq-src 0 --db-output 0 --threads 24 --compressed 0 -v 3

Invalid database read for database data file=/public3/home/sc30177/.conda/envs/hecatomb/lib/python3.11/site-packages/hecatomb/snakemake/workflow/../databases/aa/virus_secondary_aa/sequenceDB_h, database index=/public3/home/sc30177/.conda/envs/hecatomb/lib/python3.11/site-packages/hecatomb/snakemake/workflow/../databases/aa/virus_secondary_aa/sequenceDB_h.index Size of data: 173494738 Requested offset: 2711359106 Invalid database read for database data file=/public3/home/sc30177/.conda/envs/hecatomb/lib/python3.11/site-packages/hecatomb/snakemake/workflow/../databases/aa/virus_secondary_aa/sequenceDB_h, database index=/public3/home/sc30177/.conda/envs/hecatomb/lib/python3.11/site-packages/hecatomb/snakemake/workflow/../databases/aa/virus_secondary_aa/sequenceDB_h.index Size of data: 173494738 Requested offset: 3627801441 Invalid database read for database data file=/public3/home/sc30177/.conda/envs/hecatomb/lib/python3.11/site-packages/hecatomb/snakemake/workflow/../databases/aa/virus_secondary_aa/sequenceDB_h, database index=/public3/home/sc30177/.conda/envs/hecatomb/lib/python3.11/site-packages/hecatomb/snakemake/workflow/../databases/aa/virus_secondary_aa/sequenceDB_h.index Size of data: 173494738 Requested offset: 2098946418 Error: filterdb died

pengouy commented 4 months ago

I have solved this problem by re-downloaded the database, because it is too huge for a sever to download an intact package.