radical-cybertools / radical.benchmark

Use RCT to benchmark HTC application on HPC resources
MIT License
0 stars 0 forks source link

Workload for benchmark #4

Open mturilli opened 7 years ago

mturilli commented 7 years ago

From Vivek:

mturilli commented 7 years ago

At the moment it is difficult to say whether BPTI/NTL9 are good candidates. Meanwhile:

vivek-bala commented 7 years ago

I have added the runtime profiles under gromacs-benchmark/data-*/profile.txt. Please take a look. I'm in the process of building gromacs with openmpi. Will update here once I have it running. Wish me luck!

andre-merzky commented 7 years ago

Good luck! :)

vivek-bala commented 7 years ago

Data in gromacs-benchmark/data-*/profile.txt now contains values for 4,8,16 and 32 core counts. Gromacs/5.1.0 built against the cray mpi was used.

vivek-bala commented 7 years ago

Plot with comparison of gromacs built against craympi and openmpi: https://github.com/radical-cybertools/radical.benchmark/blob/master/gromacs-benchmark/gromacs_perf_craympi_vs_openmpi.png.

Instructions for gromacs compilation on Titan: non-mpi: https://github.com/vivek-bala/docs/blob/master/misc/gromacs-titan-nonmpi openmpi: https://github.com/vivek-bala/docs/blob/master/misc/gromacs-titan-openmpi

marksantcroos commented 7 years ago

Thanks for doing that. Note that any difference is only expected beyond the node boundary. Given that the current installation of OMPI is not configured for optimisation I'm not too disappointed by the delta at 32 cores.

andre-merzky commented 7 years ago

Hey Mark,

to clarify though, we are not benchmarking the workload or ORTE or RP really, but use the workload and ORTE and RP to benchmark titan... So at least for this experiment, we don't worry about those data either, just want to understand how the workload behaves in this context.

andre-merzky commented 7 years ago

Instructions for gromacs compilation on Titan: non-mpi: https://github.com/vivek-bala/docs/blob/master/misc/gromacs-titan-nonmpi openmpi: https://github.com/vivek-bala/docs/blob/master/misc/gromacs-titan-openmpi

Thanks for those, @vivek-bala . I still don't understand the last part: how does getting an error verify that the installation is usable?

Also, what is the effect of PMI_NO_FORK?

vivek-bala commented 7 years ago

Thanks for those, @vivek-bala . I still don't understand the last part: how does getting an error verify that the installation is usable?

Its just a quick way to test if gromacs is installed correctly. Running the will print the version and path of the gromacs being used (and some other gromacs messages). Think of it as a gromacs --version.

Once installed and quick-tested, you can run https://github.com/radical-cybertools/radical.benchmark/blob/master/gromacs-benchmark/openmpi/data-ntl9/threads_8/run.pbs to run with data. In this case, it will not produce any error and produces expected output.

Also, what is the effect of PMI_NO_FORK?

Without setting the variable, I get the following:

--------------------------------------------------------------------------
Direct launch with aprun only works when either the PMI_NO_FORK environment
variable is set, or Open MPI is built with dlopen support disabled.
--------------------------------------------------------------------------
mturilli commented 7 years ago

Testing combinations:

MPI library Executable Configuration Run method RP Scheduler Scale Status
CrayMPI (ORNL) Gromacs (ORNL) 4 threads; 8 processes aprun via PBS 1 task; 32 cores successful
OpenMPI (our) Gromacs (our) 4 threads; 8 processes aprun via PBS 1 task; 32 cores successful
OpenMPI (our) Gromacs (our) 4 threads; 8 processes ORTE via PBS 10 tasks; 32 cores successful
OpenMPI (our) Gromacs (our) 1 thread; 32 processes aprun via PBS 10 tasks; 32 cores failed
OpenMPI (our) Gromacs (our) 1 thread; 32 processes ORTE via PBS 10 tasks; 32 cores failed
OpenMPI (our) Gromacs (our) 1 thread; 32 processes ORTE via RP Devel 64 tasks; ; 1k cores failed
OpenMPI (our) Gromacs (our) 1 thread; <32 processes ORTE via RP devel 64 tasks; 1k cores failed
OpenMPI (our) Gromacs (our) 4 thread; 8 processes ORTE via RP GPU 64 tasks; 512 cores failed