phac-nml / staramr

Scans genome contigs against the ResFinder, PlasmidFinder, and PointFinder databases.
Apache License 2.0
111 stars 25 forks source link

Duplicated resfinder hits with update to database #180

Closed kimyoungw closed 1 year ago

kimyoungw commented 1 year ago

Hi,

We're getting duplicated resfinder hits on the same contig, one with unknown under Predicted Phenotype and then another with the predicted phenotype. This appears to only happen when we update the resfinder database to one of the most recent commits. Older/default versions of the database don't have this issue.

image

We're running staramr in a singularity container built into nextflow: docker://staphb/staramr:0.7.1 Singularity params: -B $PWD:/data --cleanenv --no-home

Commands run in container: staramr db update --resfinder-commit ${params.staramr_resfinder_commit} --pointfinder-commit ${params.staramr_pointfinder_commit} --plasmidfinder-commit ${params.staramr_plasmidfinder_commit} databases

staramr search --database databases --pid-threshold 90 --percent-length-overlap-resfinder 50 --no-exclude-genes -o "staramr" ${shovill_contigs}

Any help is much appreciated.

apetkau commented 1 year ago

What is the resfinder database commit you are using where the issue occurs? It might be the case there are duplicated entries in some commits of the database.

kimyoungw commented 1 year ago
staramr_resfinder_commit = '1a53e55081ce35dfc05ecd995e6326eb8c428b19' //Date of commit: Wed, 31 May 2023 08:34
staramr_pointfinder_commit = 'cb7806fd9bfeec1497ff9f6b6c278284ba560ec6' //Date of commit: Wed, 31 May 2023 08:37
staramr_plasmidfinder_commit = '314d85f43e4e018baf35a2b093d9adc1246bc88d' //Date of commit: Fri, 17 Mar 2023 08:05

These are the commits.

kimyoungw commented 1 year ago

oops didn't mean to close

apetkau commented 1 year ago

Thanks so much @kimyoungw

It looks like as if this commit in the resfinder database (https://bitbucket.org/genomicepidemiology/resfinder_db/commits/d34e556f8a917d7ac833141d87c0894253c56173, 2023-02-27) there was an all.fsa file added, which contains duplicates of all the AMR sequences that are found in all the other *.fsa files.

The developers of StarAMR are not affiliated with the maintainers of the ResFinder database, and so we cannot guarantee that any particular commit will work properly with StarAMR. We only make an effort to make sure that the default database files distributed with StarAMR work properly.

My recommendation would be to either downgrade to a commit prior to the above, or remove the all.fsa file form the resfinder database directory used by StarAMR.

I also noticed that you are using StarAMR 0.7.1, but version 0.9.1 is available. Version 0.9.1 uses these specific database commits if you wish to downgrade to them specifically (but there might be other fixes in StarAMR that are unavailable in version 0.7.1 that prevent proper use of these database commits).

https://github.com/phac-nml/staramr/blob/3f32f49fb685c7ab1dd0c46ae7753423a86698cb/staramr/databases/AMRDatabasesManager.py#L16-L20

kimyoungw commented 1 year ago

Thank you for the prompt reply and thorough explanation. Yea we're using 0.7.1 because it's the only containerized version that works for us at the moment :l But yes, will definitely try to see if the database commits you suggested resolve our issues. Thanks again! Closing this issue.