oschwengers / bakta

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids
GNU General Public License v3.0
428 stars 51 forks source link

update AMRFinderPlus database fails #203

Closed shigdon closed 1 year ago

shigdon commented 1 year ago

Hello,

I am attempting to use this tool and am beginning with database installation on my institutions HPC. I am accessing bakta via Conda environment.

(bakta) [shigdon@c1hitachi10 bakta]$ bakta_db download --output . --type full
Bakta software version: 1.7.0
Required database schema version: 5

fetch DB versions...
        ... compatible DB versions: 1
download database: v5.0, type=full, 2023-02-20, DOI: 10.5281/zenodo.7669534, URL: https://zenodo.org/record/7669534/files/db.tar.gz...
|████████████████████████████████████████| 33.1G/33.1G [100%] in 2:56:16.9 (3.13M/s)
        ... done
check MD5 sum...
        ...database file OK: 3200136a0a32b3c33d1cb348ab6b87de
extract DB tarball: file=/ceph/db/bakta/db/db.tar.gz, output=/ceph/db/bakta/db
successfully downloaded Bakta database!
        version: 5.0
        Type: full
        DOI: 10.5281/zenodo.7669534
        path: /ceph/db/bakta/db/db
update AMRFinderPlus database...
AMRFinderPlus failed! amrfinder-error-code=1
ERROR: AMRFinderPlus failed! command: 'amrfinder_update --force_update --database /ceph/db/bakta/db/db/amrfinderplus-db', error code: 1
(bakta) [shigdon@c1hitachi10 bakta]$

I've tried to see if there is another issue related to this one to see if I can solve this but did not find anything.

Here are the contents of db/

(bakta) [shigdon@c1hitachi10 bakta]$ ls db/db/
antifam.h3f  antifam.h3i  antifam.h3m  antifam.h3p  bakta.db  expert-protein-sequences.dmnd  ncRNA-genes.i1f  ncRNA-genes.i1i  ncRNA-genes.i1m  ncRNA-genes.i1p  ncRNA-regions.i1f  ncRNA-regions.i1i  ncRNA-regions.i1m  ncRNA-regions.i1p  oric.fna  orit.fna  pfam.h3f  pfam.h3i  pfam.h3m  pfam.h3p  psc.dmnd  rfam-go.tsv  rRNA.i1f  rRNA.i1i  rRNA.i1m  rRNA.i1p  sorf.dmnd  version.json

I've tried running the bakta_proteins function since that is what I am primarily interested in calling, but have encountered this error:

(bakta) [shigdon@c1hitachi20 bakta_test]$ bakta_proteins --db /ceph/db/bakta/db --prefix 1617 --output bakta_1617 --threads 24 ../etec_refs/1617.faa
        imported: 4685
annotate protein sequences...
        detected IPSs: 4294
        found PSCs: 361
        found PSCCs: 17
        lookup annotations...
        conduct expert systems...
Traceback (most recent call last):
  File "/ceph/home/shigdon/.conda/envs/bakta/bin/bakta_proteins", line 10, in <module>
    sys.exit(main())
  File "/ceph/home/shigdon/.conda/envs/bakta/lib/python3.10/site-packages/bakta/proteins.py", line 128, in main
    annotate_aa(aas)
  File "/ceph/home/shigdon/.conda/envs/bakta/lib/python3.10/site-packages/bakta/proteins.py", line 195, in annotate_aa
    expert_amr_found = exp_amr.search(aas, aa_path)
  File "/ceph/home/shigdon/.conda/envs/bakta/lib/python3.10/site-packages/bakta/expert/amrfinder.py", line 47, in search
    raise Exception(f"amrfinder error! error code: {proc.returncode}. Please, try 'amrfinder_update --force_update --database {amrfinderplus_db_path}' to update AMRFinderPlus's internal database.")
Exception: amrfinder error! error code: 1. Please, try 'amrfinder_update --force_update --database /ceph/db/bakta/db/amrfinderplus-db' to update AMRFinderPlus's internal database.

I then tried to follow the instruction prompt:

(bakta) [shigdon@c1hitachi20 bakta_test]$ amrfinder_update --force_update --database /ceph/db/bakta/db
Running: amrfinder_update --force_update --database /ceph/db/bakta/db
Looking up the published databases at https://ftp.ncbi.nlm.nih.gov/pathogen/Antimicrobial_resistance/AMRFinderPlus/database/

*** ERROR ***
CURL: Cannot read from https://ftp.ncbi.nlm.nih.gov/pathogen/Antimicrobial_resistance/AMRFinderPlus/database/

HOSTNAME: c1hitachi20.ki.se
SHELL: /bin/bash
PWD: /projects/AJMR_rnaseq/bakta_test
PATH: /ceph/hpome/shigdon/bin:/ceph/hpome/shigdon/bin:/ceph/hpome/shigdon/bin:/ceph/home/shigdon/.conda/envs/bakta/bin:/apps/miniconda3/condabin:/ceph/home/shigdon/.local/bin:/ceph/home/shigdon/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
Progam name:  amrfinder_update
Command line: amrfinder_update --force_update --database /ceph/db/bakta/db
(bakta) [shigdon@c1hitachi20 bakta_test]$

There seems to be a problem with reading from the NCBI ftp server.

Is there a known fix for this or something I can do to work around? Any help is very appreciated. Thank you!

oschwengers commented 1 year ago

Hi @shigdon , that's weird. I just tested the entire round-trip w/o issues. Maybe this was indeed a temp issue with the NCBI server?

However, there's one issue with your amrfinder_update command: the db path should be /ceph/db/bakta/db/amrfinderplus-db in order to let Bakta find the AMRFinderPlus DB within the Bakta db.

This this error still occuring?

oschwengers commented 1 year ago

I'll close this for now. Please do not hesitate to re-open it in any case. Best regards!

geboro commented 1 year ago

Hello, I'm having the same error. I downloaded the AMR database manually and set it in '/home/bioinf/bioinf_archive/98_mökiDatabases/bakta/db/amrfinderplus-db', but it still fails:

amrfinder_update  --database /home/bioinf/bioinf_archive/98_mökiDatabases/bakta/db/amrfinderplus-db
Running: amrfinder_update --database /home/bioinf/bioinf_archive/98_mökiDatabases/bakta/db/amrfinderplus-db
Looking up the published databases at https://ftp.ncbi.nlm.nih.gov/pathogen/Antimicrobial_resistance/AMRFinderPlus/database/

*** ERROR ***
CURL: Cannot read from https://ftp.ncbi.nlm.nih.gov/pathogen/Antimicrobial_resistance/AMRFinderPlus/database/

HOSTNAME: ?
SHELL: /bin/bash
PWD: /home/bioinf
PATH: /home/bioinf/mambaforge/envs/bakta/bin:/home/bioinf/mambaforge/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
Progam name:  amrfinder_update
Command line: amrfinder_update --database /home/bioinf/bioinf_archive/98_mökiDatabases/bakta/db/amrfinderplus-db

I thought this might be a certificate issue with curl, but curl is working:

curl https://ftp.ncbi.nlm.nih.gov/pathogen/Antimicrobial_resistance/AMRFinderPlus/database/
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
 <head>
  <title>Index of /pathogen/Antimicrobial_resistance/AMRFinderPlus/database</title>
 </head>
 <body>
<h1>Index of /pathogen/Antimicrobial_resistance/AMRFinderPlus/database</h1>
<pre>Name                                              Last modified      Size  <hr><a href="/pathogen/Antimicrobial_resistance/AMRFinderPlus/">Parent Directory</a>                                                       -   
<a href="3.10/">3.10/</a>                                             2021-12-22 13:07    -   
<a href="3.2/">3.2/</a>                                              2019-10-31 11:24    -   
<a href="3.6/">3.6/</a>                                              2020-03-25 08:36    -   
<a href="3.8/">3.8/</a>                                              2020-10-02 15:43    -   
<a href="3.9/">3.9/</a>                                              2020-12-21 10:09    -   
<a href="3.10/">3.10/</a>                                              2022-10-12 10:00    -
<a href="3.11/">3.11/</a>                                              2022-10-12 10:00    -
<a href="latest/">latest/</a>                                             2021-12-22 13:19    -   
<hr></pre>

Why does 'bakta' needs to run an update of AMRFinder if the database was manually downloaded? Could it be possible to prepare these databases manually and simply point to the directory where they are installed?

Thanks in advance! /