pipparichter / find-a-bug

A flask-based RESTful API for interfacing with a SQL database of microbial genes and annotations.
0 stars 0 forks source link

Remove prefixes from genome IDs #1

Closed pipparichter closed 5 months ago

pipparichter commented 5 months ago

The genome IDs, as obtained directly from GTDB, have two-letter prefixes RS or GB which indicate whether or not the genome was obtained from RefSeq or GenBank. These are causing search issues, as query genome IDs are not finding hits in the database (when they should). I should re-upload everything without these prefixes, and perhaps add a column indicating the origin of the sequence.

pipparichter commented 5 months ago

https://forum.gtdb.ecogenomic.org/t/gtdb-release95-genome-name-prefixes/31/2