greatfireball / NCBI-Taxonomy

MIT License
4 stars 1 forks source link

NCBI phased out gi numbers #5

Open iimog opened 7 years ago

iimog commented 7 years ago

NCBI decided to phase out gi numbers in favour of accession numbers: https://www.ncbi.nlm.nih.gov/news/03-02-2016-phase-out-of-GI-numbers/ This module heavily relies on gi numbers and might not work with more recent ncbi taxonomy dumps. Functions to use accessions rather than gi numbers need to be implemented.

iimog commented 6 years ago

Libraries in other languages that already implement this feature:

I tried in `/tmp/` as working directory: ```R library(taxonomizr) getNamesAndNodes() getAccession2taxid(type="nucl_gb") read.accession2taxid(list.files('.','accession2taxid.gz$'),'accessionTaxa.sql') ``` this resulted in the following error: ``` Reading nucl_gb.accession2taxid.gz. Reading in values. This may take a while. Error: Problem creating sql file. Deleting. Error in connection_import_file(conn@ptr, name, value, sep, eol, skip) : RS_sqlite_import: /tmp/RtmpvgISg2/file6e4460728c25 line 113360323 expected 2 columns of data but found 1 ``` The referenced file seems to be truncated in line 113360323 (it ends after the accession with no line break).
iimog commented 6 years ago

As expected my inability to get taxonomizr to run was my own fault. Apparently my /tmp partition was too small and the process ran out of disk space. The error handling in this case could be better but still this is on me. After trying again on a larger partition taxonomizr works as intended.

greatfireball commented 6 years ago

Current release v0.90 should support accessions. Please test it and let me know @iimog

iimog commented 2 years ago

It works. Thanks! :tada: Sorry for the late response.