Benchmarking queries against UNIPROT Database

Download uniprot_sprot.fasta.gz from https://www.uniprot.org/downloads under UniprotKB/Reviewed(Swiss-Prot).
Gunzip the file and put it in project/data/uniprot directory
On Leonhard cluster - copy the whole project to /cluster/scratch/username/ directory since you will have many files which is not allowed in your home directory on the cluster
Run python reader.py uniprot_prepare from py directory (this will generate many files in data/uniprot, with 1 read from the database each, and stats.txt file in data/uniprot with one number - showing the number of files containing reads)
If step 4 is taking too long, you can stop it at some point and manually create stats.txt file after checking out what is the number of files generated until the stopping point
Create a job from the binary bin/mpi_sw_solve_uniprot
After the job is completed check the standard output of the job and each rank should have printed run time in microseconds spent on updating cells, and the amount of cells updated total (this part needs to be further automated)

**The protein used as a query is in data/query/ and you can replace it with any protein you want

Disclaimer It was a long evening, please pay attention to whether I made any obvious mistakes and correct them or contact me if you see I did.

kosta777 / parallel-genomeseq