Some software requires environment variables to run - e.g. PyTorch's distributed framework requires MASTER_PORT (among others) to be set. As discussed on Slack, this is currently challenging if the test developer doesn't know the configured launcher in advance.
I.e. if we know that OpenMPI's mpirun will be the launcher, we can do
self.job.launcher.options = ['-x MASTER_PORT']
But if we are writing a test with the purpose of it being reused (e.g. a test for the hpctestlib), it would be nice to have a way of specifying this in a launcher-agnostic way. E.g.
(the 2nd is probably more convenient, but not sure which API is easiest to support from the ReFrame side).
ReFrame would then abstract how each particular launcher exports environment variables. E.g. for OpenMPI, the ReFrame backend would add -x MASTER_PORT=1234 as extra launcher argument, whereas for srun it would add --export=MASTERPORT=1234.
Note that right now, I worked around this issue by making a wrapper shell script that sets the environment variables, similar to what is used here by CSCS in their PyTorch test.
Some software requires environment variables to run - e.g. PyTorch's distributed framework requires
MASTER_PORT
(among others) to be set. As discussed on Slack, this is currently challenging if the test developer doesn't know the configured launcher in advance.I.e. if we know that OpenMPI's
mpirun
will be the launcher, we can doBut if we are writing a test with the purpose of it being reused (e.g. a test for the
hpctestlib
), it would be nice to have a way of specifying this in a launcher-agnostic way. E.g.or
(the 2nd is probably more convenient, but not sure which API is easiest to support from the ReFrame side).
ReFrame would then abstract how each particular launcher exports environment variables. E.g. for OpenMPI, the ReFrame backend would add
-x MASTER_PORT=1234
as extra launcher argument, whereas forsrun
it would add--export=MASTERPORT=1234
.Note that right now, I worked around this issue by making a wrapper shell script that sets the environment variables, similar to what is used here by CSCS in their PyTorch test.