The build-database.sh script in the unipept-database repository should be expanded with a new "target" that produces the input files that can then subsequently be used for the construction of a suffix array.
More specifically: add the target "suffix-array" as a new case to the switch-statement on the bottom of the build-database.sh script. This target should produce two TSV-files:
1) taxons.tsv: structure of this file should remain the same as it is today
2) proteins.tsv: this file should contain three columns: uniprot-accession, protein sequence, associated taxon id.
The
build-database.sh
script in theunipept-database
repository should be expanded with a new "target" that produces the input files that can then subsequently be used for the construction of a suffix array.More specifically: add the target "suffix-array" as a new case to the switch-statement on the bottom of the
build-database.sh
script. This target should produce two TSV-files: 1)taxons.tsv
: structure of this file should remain the same as it is today 2)proteins.tsv
: this file should contain three columns: uniprot-accession, protein sequence, associated taxon id.