Closed tabinks closed 8 years ago
I have put together a template script for using python multiprocessing library in a slurm environment. This script iterates through 100 task using 1 node and 16 cores on a sandy bridge processor. The function process_worker
receives a different parameter during each iteration. You can imagine using this to specify which chunk of a database your should search against.
This is not a complete solution to the questions, but it is a great place to start.
I added the formula for speedup. You will need to run a serial version of the database search to get the baseline run time. Remember to use the pdb.fasta
database, not the nr
database.
Should we still build the nr database for question #1 or use the pdb database? (Sorry if this question seems slack--there was another compilers deadline this weekend and now I'm catching up)
No. Do not try to build nr. Just use the pdb database.
Question 1
The version of
mpiblast
that is on RCC is not compatible with thenr
database. Apparently,mpiblast
is not very popular and hasn't been recompiled in a while. I have created anmpiblast
database based on the Protein Data Bank. It is located at/project/mpcs56420/databases/pdb/
. You will need to copy all of the filespdb.fasta*
to a directory that can be read by nodes (e.g./scatch/midway/
).I have prepared a tutorial to help you set up and run an
mpiBlast
job using this pdb database. Here is the sbatch script used in the tutorial.Since you will not be able to directly compare the results to the
blastplus
results from last week, I have modified the questions. Please view the revised homework here.