Open heejongkim opened 3 years ago
Hi Hee Jong,
To run parallel-processing replica exchange, IMP must be compiled using MPI. For example, mpicxx
. To install IMP, use the CMAKE flag -DCMAKE_CXX_COMPILER=/usr/local/bin/mpicxx
. You can read more about using the CMAKE flags for installing IMP here.
One can then use mpirun
to initiate a parallel job, e.g.:
mpirun -np 4 python modeling.py
which will perform a single modeling run with four replicas.
Running multiple modeling runs on a cluster requires setting up a script specific for that cluster software and architecture. Once you have successfully been able to run a single parallel replica exchange simulation using the command above, you should be able to use that line in your cluster submission script.
To install IMP, use the CMAKE flag -DCMAKE_CXX_COMPILER=/usr/local/bin/mpicxx
This isn't a great idea because it will result in all of IMP being compiled with MPI. Only the IMP::mpi
module needs to be compiled with MPI. As long as mpicxx
and friends are in your PATH
CMake should do the right thing. Most of the prebuilt IMP binaries (e.g. homebrew, Anaconda, RPM) are built with MPI support.
Thanks for both.
@saltzberg I already compiled the imp with the cluster's mpicxx and all that and put it in the module. What I actually got confused about is rnapolii/modeling/run_rnapolii_modeling.sh looping through N and n_steps. And this repo's modeling.py takes those info + output path as arguments so I wanted to make sure how to properly edit those to meet mpirun "expectations".
For example, from previous RNA pol II tutorial, I made the following SLRUM script to submit the modeling job.
!/usr/bin/bash
SBATCH --partition=defq
SBATCH --output=logfiles/%j.out
SBATCH --error=logfiles/%j.err
SBATCH --nodes=8
SBATCH --ntasks-per-node=48
module load imp/2.13.0 ## this will automatically load as well as unload dependencies and conflicts mpirun --map-by node python modeling.py ## instead of using -np, used --map-by coupled with --ntasks-per-node to specify the number of threads per node
it would be awesome if you can help me convert the for loop in bash to mpirun command. Thank you for your guidance.
@benmwebb Would it cause any serious issues if I set CXX_COMPILER to mpicxx? Due to the cluster environment complexity, I preferred to be explicit so I set that up and compiled. If i don't, sometimes it's a little bit difficult to keep track of which compiler/libraries were used for this specific instance. Thank you for your insight.
So, I just changed
global_output_directory="output" in ReplicaExchange0
and
num_frames to fixed value instead of sys.argv to take number from command line
and utilized my SLURM script to submit with mpirun
I've been watching the log and queue for an hour and it seems emitting expected outputs and not failing. If you have any other suggestions to improve, please let me know.
Thanks!
Ah... seems like it's hitting something with mpirun.
Same data, topology and almost identical modeling.py (only change is sys.argv portion) Even single node run, mpirun spit the following error while run_rnapolii_modeling.sh is fine to start the iteration.
Traceback (most recent call last): File "init.py", line 141, in
max_srb_rot=0.3) File "/cm/shared/apps/imp/2.13.0/lib64/python3.7/site-packages/IMP/pmi/macros.py", line 723, in execute_macro self.root_hier = self.system.build() File "/cm/shared/apps/imp/2.13.0/lib64/python3.7/site-packages/IMP/pmi/topology/init.py", line 155, in build state.build(kwargs) File "/cm/shared/apps/imp/2.13.0/lib64/python3.7/site-packages/IMP/pmi/topology/init.py", line 260, in build mol.build(kwargs) File "/cm/shared/apps/imp/2.13.0/lib64/python3.7/site-packages/IMP/pmi/topology/init.py", line 747, in build self, rep, self.coord_finder, rephandler) File "/cm/shared/apps/imp/2.13.0/lib64/python3.7/site-packages/IMP/pmi/topology/system_tools.py", line 275, in build_representation model) File "/cm/shared/apps/imp/2.13.0/lib64/python3.7/site-packages/IMP/isd/gmm_tools.py", line 40, in decorate_gmm_from_text weight=float(fields[2]) IndexError: list index out of range
Any suggestions and/or insights are very much appreciated.
Thanks.
Hi,
I would like to perform the computational expensive modeling.py across multiple nodes. However, it seems like modeling.py with its associated script is made for a single machine. Do you have any examples or recommendations how to accomplish that? I assume that i need to use mpi but I'm not sure how to properly modify to maximize the speed, efficiency, and replica exchange across nodes.
Thanks.
best, hee jong