unipept / unipept-database

Makes database tables and indices for Unipept
MIT License
0 stars 2 forks source link

Optimize the current relational database for future compatibility with the suffix array #55

Closed bmesuere closed 4 months ago

bmesuere commented 4 months ago

In order to be able to efficiently respond to queries of non-tryptic peptides using the new suffix array in the Unipept API, we need to make some changes to the relational database that's also being used by the current version of Unipept.

Tables that are no longer required

peptides This table contains a listing of all tryptic peptides that we're in-silico digested by Unipept and the associated precomputations. Because we're no longer working with tryptic peptides (and as a consequence precomputed lca's or functional annotations), this table is no longer usefull. sequences This table is very similar to peptides and contains the actual tryptic peptide sequences. It can be completely removed as well.

Changes to tables

uniprot_entries This table should consist of the following columns: *id: does not change

Note that the build_database.sh script should be updated accordingly to produce these files.