benchmark parallelization on CU summit

shirtsgroup / cg_openmm

Tools to build coarse grained models and perform simulations with OpenMM

MIT License

20 stars 8 forks source link

benchmark parallelization on CU summit #23

Open mrshirts opened 4 years ago

mrshirts commented 4 years ago

Both as number of cores (on 1 and 2 nodes) and as a function of number of particles.

cwalker7 commented 4 years ago

Only did this for the 12mer system with 12 replicas and 24 replicas so far, but its looking like running on 2 cores is optimal. Any more and performance can be worse than serial mode, even if we allocate 1 core/replica. Will have more on this once I build larger systems.

mrshirts commented 4 years ago

Like, speedup is sublinear (4 cores takes 1/3 of the time of 1 core, instead of 1/4), or it actually gets slower? (4 cores take 1.5 times 1 core).

This might be a function of replica exchange time as well - with less frequent exchange, then the communication will take up less of the total time.

cwalker7 commented 4 years ago

The latter - it is actually more wall clock time if we go beyond 2 core.

I agree it should depend on the exchange frequency.