openhpc / cloudwg

Cloud Working Group Effort
Apache License 2.0
3 stars 2 forks source link

Document other tutorials and resources for background / more info #11

Open mghpcsim opened 4 years ago

mghpcsim commented 4 years ago

On the tutorial website, there will be a resources page to provide links to other tutorial content we've done and more info on specific topics as needed.

This ticket is a catchall for those things.

This is where to put all the things David has been sending to the list for example.

koomie commented 4 years ago
mghpcsim commented 4 years ago

Karl added the PEARC 17 and 19 tutorial to the GH pages site;

Others have stuff as well especially on containers; need to figure out how best to incorporate them into the GH pages site ala PEARC 17 and 19 or just add one page as a resources / link collation page

DavidBrayford commented 4 years ago

For exercise 5 we can use this tutorial as a template https://github.com/DavidBrayford/HPAI/blob/master/tutorial/Intel_HPC_DevCon

I've also uploaded the Charliecloud container to my onedrive account and shared a link with Chris S.

i can rewrite the recipe for PEARC20.

Charliecloud execution line for PEARC20: No MPI: ch-run -w ./pearc_tutorial_image -- python /tensorflow/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model alexnet --batch_size 128 --data_format NCHW --num_batches 100 --distortions=False --mkl=True --local_parameter_device cpu --num_warmup_batches 10 --optimizer rmsprop --display_every 10 --variable_update horovod --horovod_device cpu --num_intra_threads 8 --kmp_blocktime 1 --num_inter_threads 2

Using MPI inside container: mpiexec -n 2 ch-run -w ./pearc_tutorial_image -- python /tensorflow/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model alexnet --batch_size 128 --data_format NCHW --num_batches 100 --distortions=False --mkl=True --local_parameter_device cpu --num_warmup_batches 10 --optimizer rmsprop --display_every 10 --variable_update horovod --horovod_device cpu --num_intra_threads 8 --kmp_blocktime 1 --num_inter_threads 2

This uses the MPI libraries installed within the container, be sure to ensure that the MPI version inside the library is compatible with the system version of MPI.

Using system MPI from within the container: mpiexec -n 2 ch-run -b /where/MPI/Module/are.:/where/MPI/modules/are -w ./pearc_tutorial_image -- python /tensorflow/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model alexnet --batch_size 128 --data_format NCHW --num_batches 100 --distortions=False --mkl=True --local_parameter_device cpu --num_warmup_batches 10 --optimizer rmsprop --display_every 10 --variable_update horovod --horovod_device cpu --num_intra_threads 8 --kmp_blocktime 1 --num_inter_threads 2

This uses the runtime MPI libraries on the system (for example, module load mpich.mpi) by importing the host environment (PATH etc) and maps the host directories to equivalent directories inside the container., improves stability, scalability and performance on large HPC systems with tuned MPI setup. Ideally you want to build you application inside the container with the same vendor version of MPI as used on the system. For example, if the host system has been optimized to use Intel MPI, install Intel MPI inside the container and build the containerized application with Intel MPI.

bkmgit commented 3 years ago

It may be helpful to have some of this material in HPC Carpentry