shandley / hecatomb

hecatomb is a virome analysis pipeline for analysis of Illumina sequence data
MIT License
54 stars 12 forks source link

Process secondary_nt_calc_lca failed #117

Open pengouy opened 1 month ago

pengouy commented 1 month ago

Hi, I encontered with a new error in running hecatomb v1.3.2, here is the related log information:

15:23:41.923 [WARN] taxid 1965493 was merged into 3070820
15:23:41.923 [WARN] taxid 45219 was merged into 3052307
15:23:41.924 [ERRO] bufio.Scanner: token too long
15:23:42.228 [ERRO] lineage-field (4) out of range (3):ZS32:1:1.654e-04:204011 

The resource "time" of this step was setted as "sml". When I first met with this error, I thought it was caused by the insufficient time setting, becuase I found it needed more than 1 hour to finish this step with my own data, so I changed the "sml" to "ram" in read_annotation.smk file but it still not working. Please check this issue.

pengouy commented 1 month ago

It seems that this error occured during excuting taxonkit in a mmseq2 environment, and I found some related error information about taxonkit in [https://github.com/shenwei356/taxonkit/issues/75]. The auther updated this part of code in a higher version. Considering it is a time consuming process to re-install the hecatomb and it may have a conflict with other dependencies, so I did not try a verion of taxonkit >0.8.0. Now I have solved this problem with following steps:

  1. Modifying the original code of taxonkit v0.8.0 by adding a "buffer-size" flag in lca.go file to increase the buffer space, refering to the latest version;
  2. Compiling the code and the replace the binary file with it in this catalogue anaconda3/envs/hecatomb/lib/python3.11/site-packages/hecatomb/snakemake/conda/1d30c962f1392466f06c4a5792ffe366_/bin
  3. Re-run the job and no [ERRO] bufio.Scanner occured.