ncbi / amr

AMRFinderPlus - Identify AMR genes and point mutations, and virulence and stress resistance genes in assembled bacterial nucleotide and protein sequence.
https://www.ncbi.nlm.nih.gov/pathogens/antimicrobial-resistance/AMRFinder/
Other
277 stars 40 forks source link

NG_nnnnnn accession numbers missing version #14

Closed tseemann closed 5 years ago

tseemann commented 5 years ago

In AMR_CDS and some other files it is just NG_047055

>446950888|WP_001028144.1|NG_047055|1|2|aac(6')-Ie|aac(6')-Ie|
bifunctional_aminoglycoside_N-acetyltransferase_AAC(6')-Ie/aminoglycoside_O-phosphotransferase_APH(2'')-Ia 
NG_047055:101-1540

But in ReferenceGeneCatalog.txt is is NG_047055.1

evolarjun commented 5 years ago

We generally link via the protein accession, (WP001028144.1 in your example), but I can see we're inconsistent with the nucleotide accessions; the NG records don't have a version, but the GenBank accessions do. For the purposes of our database protein sequences are primary, so we usually link via protein sequence/accession. We'll make things consistent in a future release.

tseemann commented 5 years ago

Actually NG_nnnnnnn.v do have version numbers, including v=2 and v=3, you have 31 of them?

% cut -f11 ReferenceGeneCatalog.txt | grep '\.[2-9]$' | uniq

NG_052583.3
NG_047295.2
NG_047307.2
NG_056002.2
NG_048749.2
NG_048791.2
NG_048905.2
NG_049041.2
NG_049089.2
NG_049323.2
NG_062218.2
NG_057591.2
NG_057597.2
NG_049984.2
NG_050235.2
NG_050242.2
NG_055993.2
NG_047699.2
NG_047784.2
NG_055651.2
NG_055784.2
NG_060581.2
NG_052176.2
NG_050472.2
NG_050504.2
NG_048128.2
NG_048275.2
NG_050504.2
NG_048128.2
NG_048275.2
NG_048525.2
NG_048542.2
NC_000913.3
NC_003197.2
evolarjun commented 5 years ago

Yes, that's what I meant when I said "inconsistent". I had missed this previously because our internal use of AMR_CDS is limited. We'll get this fixed so the '.version' is included for our next database release.

tseemann commented 5 years ago

Thanks @evolarjun

evolarjun commented 5 years ago

@tseemann This should be fixed in the latest release of the database (2019-10-30.1) Note that the FTP site paths have changed so the software will not break by updating to backwards incompatible database versions. The new site is at https://ftp.ncbi.nlm.nih.gov/pathogen/Antimicrobial_resistance/AMRFinderPlus/database/latest/

I hadn't mentioned before that the Reference Gene Catalog is made for external consumption, and should have the data you need. It also has a Web interface. New documentation for ReferenceGeneCatalog.txt is here.

We have a new version of AMRFinderPlus (3.2.1) compatible with this database that I encourage you to try out and let us know what issues you find. Your feedback is (almost) always appreciated. ;-)

Thanks again.

tseemann commented 5 years ago

I look forward to annoy^H^H^H^H^Hhelping you in the future.