Closed sralchemab closed 2 months ago
Hi @sralchemab,
Thanks for this issue and such detailed info.
The variable
ref_query
is made by concatenatingassembly_accession
andasm_name
. However,top_reference
only contains theassembly_accession
. The issue is easily fixable by updating line 125 to:if assembly_accession == top_reference:
Regarding this, there is an in-progress PR, here, addressing this error . However, it hasn't been reviewed yet. Since you have been testing KmerFinder, it would be great if you could join the review process. :)
For the
--kmerfinderdb
, I tried using the Zenodo link on the option documentation, but the file looks like it's corrupted (the MD5 result matches but it throws an error when trying to unpack). Because of this, I followed the link to the Kmerfinder Databases from the same docs and downloaded the link from the top.
Oh... strange, it was working in previous releases. I'll take a look at it.
Additionally, I would recommend maybe increasing a bit the memory requirements for the
KMERFINDER
task, because when running the full pipeline it was failing without specifying any error message. After debugging it, I found that it was a memory issue.
Absolutely, I have also faced few memory issues with this step.
Issues:
download_reference.py
(in progress #154 )Hi @Daniel-VM ! Thanks for looking into this so fast. I reviewed the PR and approved it. Would you like to add this as part of the documentation as well? Also, for some reason, it looks like my approval is not enough?
Oops, yes, you should request to join the nf-core community by sharing the URL to your GitHub repo in their Slack channel #github-invitations.
Would you like to add this as part of the documentation as well? Also, for some reason, it looks like my approval is not enough?
I think it would be better to merge PR #154 and then create a new one to include all these fixes.
Now that I realised what was the issue, I approved it with my personal Github account, with which I belong to nf-core.
Awesome, thanks. I’m currently looking into the corrupted Zenodo file, but feel free to contribute to the develop branch by addressing the issues mentioned above. You can find the contributing guidelines here.
Issues:
- [x] Fix kmerfinder script
download_reference.py
(in progress Fix kmerfinder scripts #154 )- [x] Fix corrupted kmerfinder database available in Zenodo && update prams documentation.
- [x] Updata kmerfinder db to latest version.
- [x] Increase base memory for Kmerfinder modules
Hi @sralchemab , I have made a PR with the fixes above. I hope this will solve the bug.
Thanks, @Daniel-VM ! I'll give it a go tonight.
Thanks, @Daniel-VM, all fixed! However, I found two other unrelated things that I'll report as different issues.
Description of the bug
The script
download_reference.py
from theFIND_DOWNLOAD_REFERENCE
task fails to find the top reference within the provided refseq file.After confirming that the refseq ID exists on the reference file it's been looked upon, I checked the script and found an issue in the following lines
https://github.com/nf-core/bacass/blob/c81202b7702c67d0e829085685083847b9e59435/bin/download_reference.py#L119-L125
The variable
ref_query
is made by concatenatingassembly_accession
andasm_name
. However,top_reference
only contains theassembly_accession
. The issue is easily fixable by updating line 125 to:For the
--ncbi_assembly_metadata
option, I used assembly_summary_refseq.txt which I obtained from the README mentioned on the documentation for the option (--ncbi_assembly_metadata).For the
--kmerfinderdb
, I tried using the Zenodo link on the option documentation, but the file looks like it's corrupted (the MD5 result matches but it throws an error when trying to unpack). Because of this, I followed the link to the Kmerfinder Databases from the same docs and downloaded the link from the top.One extra issue to bear in mind, is that the
KMERFINDER
task looks for the file${kmerfinder_db}/bacteria.name
which in this version of the database does not exist as such. So in order to make it work, in that same folder, the following link has to be created:This is something that could be addressed as well. However, it's easy to workaround without having to touch the pipeline.
Additionally, I would recommend maybe increasing a bit the memory requirements for the
KMERFINDER
task, because when running the full pipeline it was failing without specifying any error message. After debugging it, I found that it was a memory issue.Command used and terminal output
Relevant files
nextflow.log
System information