reframe-hpc / reframe

A powerful Python framework for writing and running portable regression tests and benchmarks for HPC systems.
https://reframe-hpc.readthedocs.org
BSD 3-Clause "New" or "Revised" License
217 stars 103 forks source link

Feature request: support for exporting environment variables with parallel launchers #3207

Open casparvl opened 4 months ago

casparvl commented 4 months ago

Some software requires environment variables to run - e.g. PyTorch's distributed framework requires MASTER_PORT (among others) to be set. As discussed on Slack, this is currently challenging if the test developer doesn't know the configured launcher in advance.

I.e. if we know that OpenMPI's mpirun will be the launcher, we can do

self.job.launcher.options = ['-x MASTER_PORT']

But if we are writing a test with the purpose of it being reused (e.g. a test for the hpctestlib), it would be nice to have a way of specifying this in a launcher-agnostic way. E.g.

test.env_vars['MASTER_PORT'] = '1234'
self.job.launcher.export_var = ['MASTER_PORT']

or

self.job.launcher.export_var['MASTER_PORT] = ['1234']

(the 2nd is probably more convenient, but not sure which API is easiest to support from the ReFrame side).

ReFrame would then abstract how each particular launcher exports environment variables. E.g. for OpenMPI, the ReFrame backend would add -x MASTER_PORT=1234 as extra launcher argument, whereas for srun it would add --export=MASTERPORT=1234.

Note that right now, I worked around this issue by making a wrapper shell script that sets the environment variables, similar to what is used here by CSCS in their PyTorch test.

vkarak commented 4 months ago

I think that

self.job.launcher.env_vars = {'MASTER_PORT': '1234'}

or

self.job.launcher.env_vars['MASTER_PORT] = '1234'

is the best and matches the test's env_vars in the syntax.