ncbi / amr

AMRFinderPlus - Identify AMR genes and point mutations, and virulence and stress resistance genes in assembled bacterial nucleotide and protein sequence.
https://www.ncbi.nlm.nih.gov/pathogens/antimicrobial-resistance/AMRFinder/
Other
256 stars 34 forks source link

Incorrect Gene Symbol in Output? #97

Closed cizydorczyk closed 1 year ago

cizydorczyk commented 1 year ago

When I run AMRFinderPlus on an S. aureus genome, one of the output lines reads:

JMB00937 fig|6666666.952341.peg.4515 abc-f ABC-F type ribosomal protection protein core AMR AMR MACROLIDE MACROLIDE HMM 642 655 95.11 35.40 644 WP_063854496.1 ABC-F type ribosomal protection protein NF000355.3 ABC-F type ribosomal protection protein

Through manual checking of what gene "abc-f" is, I found that in the database, this gene (based on reference accession) is actually "optrA" -- so why does AMRFinderPlus report the gene symbol as "abc-f"?

Or am I fundamentally misunderstanding what is being reported here and how?

Thank you, Conrad

vbrover commented 1 year ago

WP_063854496.1 would have been reported as optrA, but your protein JMB00937 is sufficiently dissimilar from WP_063854496.1 and is reported as the parent family of optrA which is abc-f using an HMM.

evolarjun commented 1 year ago

Hi Conrad,

To elaborate a little on what Slava wrote above, AMRFinderPlus identified the gene you are interested in as a likely member of the abc-F family by the HMM NF000355.3 which was designed to selectively identify even distant members of the family. You can see more about the other genes in the abc-F family in the Reference Gene Hierarchy https://www.ncbi.nlm.nih.gov/pathogens/genehierarchy/#abc-F. AMRFinderPlus reports the closest hit in the reference database (if there is one found by BLAST) which is WP_0638544936.1 to give people as much information as it can even though the hit is quite a bit below the cutoffs for what AMRFinderPlus would call the optrA gene.

BLAST only aligned the sequence called JMB00937 to 35% of the reference sequence (WP_0638544936.1) at 95% identity. Without knowing more about the sequence I can't say that much more than that it is distantly related to anything in our reference database, but it still met the requirements of the curated HMM created to selectively identify all members of the abc-F family.

I'm not sure if it will help, but you can also check out the Interpreting results section of our documentation for more information on how to interpret the output of AMRFinderPlus.

Arjun

cizydorczyk commented 1 year ago

Great, thank you for the explanation -- makes sense now.

I appreciate the help and quick responses.

Conrad