In order to be able to efficiently respond to queries of non-tryptic peptides using the new suffix array in the Unipept API, we need to make some changes to the relational database that's also being used by the current version of Unipept.
Tables that are no longer required
peptides
This table contains a listing of all tryptic peptides that we're in-silico digested by Unipept and the associated precomputations. Because we're no longer working with tryptic peptides (and as a consequence precomputed lca's or functional annotations), this table is no longer usefull.
sequences
This table is very similar to peptides and contains the actual tryptic peptide sequences. It can be completely removed as well.
Changes to tables
uniprot_entries
This table should consist of the following columns:
*id: does not change
uniprot_accession_number: does not change
version: does not change
taxon: does not change
name: does not change
sequence: does not change
go: new contains a list of GO term codes that are associated to this protein, delimited by semi-colons. E.g.: GO:00001;GO:000002
ec: new contains a list of EC numbers that are associated to this protein, delimited by semi-colons. E.g.: EC:1.1.1.1;EC:4.5.6.3
interpro: new contains a list of InterPro entries that are associated to this protein, delimited by semi-colons. E.g.: IPR:005457;IPR:054221.
Note that the build_database.sh script should be updated accordingly to produce these files.
In order to be able to efficiently respond to queries of non-tryptic peptides using the new suffix array in the Unipept API, we need to make some changes to the relational database that's also being used by the current version of Unipept.
Tables that are no longer required
peptides This table contains a listing of all tryptic peptides that we're in-silico digested by Unipept and the associated precomputations. Because we're no longer working with tryptic peptides (and as a consequence precomputed lca's or functional annotations), this table is no longer usefull. sequences This table is very similar to
peptides
and contains the actual tryptic peptide sequences. It can be completely removed as well.Changes to tables
uniprot_entries This table should consist of the following columns: *
id
: does not changeuniprot_accession_number
: does not changeversion
: does not changetaxon
: does not changename
: does not changesequence
: does not changego
: new contains a list of GO term codes that are associated to this protein, delimited by semi-colons. E.g.:GO:00001;GO:000002
ec
: new contains a list of EC numbers that are associated to this protein, delimited by semi-colons. E.g.:EC:1.1.1.1;EC:4.5.6.3
interpro
: new contains a list of InterPro entries that are associated to this protein, delimited by semi-colons. E.g.:IPR:005457;IPR:054221
.Note that the
build_database.sh
script should be updated accordingly to produce these files.