unipept / unipept-database

Makes database tables and indices for Unipept
MIT License
0 stars 2 forks source link

Create new target in build-database.sh for input of suffix array construction #54

Closed bmesuere closed 5 months ago

bmesuere commented 5 months ago

The build-database.sh script in the unipept-database repository should be expanded with a new "target" that produces the input files that can then subsequently be used for the construction of a suffix array.

More specifically: add the target "suffix-array" as a new case to the switch-statement on the bottom of the build-database.sh script. This target should produce two TSV-files: 1) taxons.tsv: structure of this file should remain the same as it is today 2) proteins.tsv: this file should contain three columns: uniprot-accession, protein sequence, associated taxon id.