mtisza1 / Cenote-Taker3

Discover and annotate the virome
MIT License
32 stars 1 forks source link

Issues with run_title, hhsuite databases, number of files #2

Closed Sapsanas closed 9 months ago

Sapsanas commented 10 months ago

Hi Mike,

Thank you very much for launching Cenote-Taker3, it's a great tool!

A few issues I ran into while running it on the subset of my data (and suggestions how to resolve them):

  1. Since you have restrictions for the length of run_title and how the output directory is created and used, adding a flag that will allow you to define the working directory precisely might be more convenient. My current workaround is changing the directory to the needed one and setting the run_title as usual.
  2. The script get_ct3_databases.py does not create directory hhsearch_DBs and therefore does not place hhsuite databases there (yet it downloads), while cenote_main.sh attempts to find files by this link, i.e. ${C_DBS}/hhsearch_DBs/NCBI_CD/NCBI_CD_a3m.ffdata. The workaround is to create the needed folder and move hhsuite databases there manually: mkdir ct3_DBs/hhsearch_DBs mv ct3_DBs/NCBI_CD ct3_DBs/hhsearch_DBs/NCBI_CD mv ct3_DBs/pfam_32_db ct3_DBs/hhsearch_DBs/pfam_32_db mv ct3_DBs/pdb70 ct3_DBs/hhsearch_DBs/pdb70
  3. (Not really an issue). Please consider adding a flag to turn off the generation of sequin_and_genome_maps and its content. It creates many files per metagenomic sample and, in the case of some HPC resource and space distribution, causes the overflow of the file number limit.
mtisza1 commented 10 months ago

Hi,

Thanks for your kind words and for opening this issue. Sorry especially for the hhsuite DB problem. That should have been fixed before the initial release.

Fortunately, each of these things should be a relatively simple update.

I'll aim to include these fixes in a new release before the end of next week.

Please let me know about any additional bugs or seemingly missing features in the future.

Mike

mtisza1 commented 9 months ago

Hi,

I've made a new release of Cenote-Taker 3, v3.2.1, which is now available on Bioconda.

https://github.com/mtisza1/Cenote-Taker3/releases/tag/v3.2.1

It should address all of your issues and comments.

One more thing, if you are having issues with file limits, I would guess that the hhsuite files (individual files for each AA seq and hhsuite output) would be a bigger issue than the GenBank files. I currently don't have a clue on how to change the hhsuite file structure as it's my understanding that hhsuite tools only accept a single AA seq as a query. Let me know if you could conceive of a workaround.

I'll close this, but please reopen if there are any unresolved issues.

Best,

Mike