reframe-hpc / reframe

A powerful Python framework for writing and running portable regression tests and benchmarks for HPC systems.
https://reframe-hpc.readthedocs.org
BSD 3-Clause "New" or "Revised" License
222 stars 103 forks source link

Case study of hpctestlib generalization (application focus) #3014

Open vkarak opened 1 year ago

vkarak commented 1 year ago

Idea:

casparvl commented 12 months ago

I'm not sure if this is what you mean, but as you may know, we also develop a very portable test suite in EESSI. We've build on top of the current GROMACS test from hpctestlib. The test takes all 'decisions' based on what is configured in the ReFrame configuration file, and then does something 'sensible'. For example: it discovers all modules that start with GROMACS/ and uses that as a parameter. Then, it generates GROMACS test on several 'scales' (1 core, 2 core, ... 1/4 node, 1/2 node, 1 node , 2 node , ...). The scales come with tags, so it's easy to run a subset of those, depending on what 'fits' the system. Finally, the test will run as a pure MPI test. Thus, it'll simply check the core counts of the node in the config file, and set num_tasks such that it equates 1 MPI rank per core in the allocation.

You can check out the test at https://github.com/EESSI/test-suite/blob/main/eessi/testsuite/tests/apps/gromacs.py if you're interested. In this config file you can see how we combine that with the features and extras. Basically, it ensures that for a module called GROMACS/...-CUDA-... a test is generated that runs on nodes that specify gpu as a feature and gpu_vendor: 'nvidia' as an extra.

This test is generic enough that we've run it succesfully on about 5 different HPC systems, without adapting the test itself. Also, I've run it both with the EESSI software stack, as well as with our local module environment, without any issues.

Anyway, I figured it could be nice as inspiration - if this is indeed the type of generalization you were thinking about :) Feel free to reach out, and we could have a chat, or elaborate more on what we've done.

casparvl commented 12 months ago

Oh, and I forgot to add: here you can find our docs on running our test suite https://www.eessi.io/docs/test-suite/installation-configuration/ so you could actually try to give it a go on your own local module stack as well, if you want.

It is very minimal right now (GROMACS and TensorFlow), but the key point so far has been about discovering how to write tests in a portable way :) You can see there that we tell people to configure ReFrame in a certain way to make it work with our test suite (e.g. use constants defined as part of the eessi test suite as features and extras to make sure those strings match exactly what is used in the test definitions, use CPU autodetection to make sure all CPU info required by our tests is defined, set devices to specify the number of GPUs, etc).