shenwei356 / taxonkit

A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV
https://bioinf.shenwei.me/taxonkit
MIT License
361 stars 29 forks source link

accession numbers not found #34

Closed srosales712 closed 3 years ago

srosales712 commented 3 years ago

Hi, I'm trying to go from Blast output to taxIDs. I parsed the Blast accession number into a single file and then wanted to verify that the accession numbers were in the prot.accession2taxid.gz database.

To do this I ran: pigz -dc /taxdb2020/prot.accession2taxid.gz | csvtk grep -t -f accession.version -P acc.txt > output.txt

head acc.txt AB111947.1 AB111947.1 MK072403.1 MK072403.1 MK072403.1

My output.txt file comes out empty - I like to know if there are any suggestions for going from a list of accessions to TaxID?

shenwei356 commented 3 years ago

Oh I'm so sorry, I missed this issue.

It seems these accessions are not found in the prot.accession2taxid.gz.

Oh I know, they are NUCLEOTIDE SEQUENCE,

https://www.ncbi.nlm.nih.gov/search/all/?term=AB111947.1

You need nucl_gb.accession2taxid.gz, prot.accession2taxid.gz is for protein records...