Closed sjpb closed 4 years ago
If the current behavior is preferred maybe it could be noted in the relevant section in here
Hi @sjpb, how is your environs
configuration parameter defined for each of the partitions above? Also how many partitions does msys
have and how did you run reframe? Judging from the error message, it seems that reframe tries to run on a
mysys
partition that defines the openfoam
environment and then it can't find a definition for it inside environments
.
I stripped my config down to a minimal example (2 partitions, 2 environments, using pingpong cause it's faster to test than openfoam). Here's the systems and environment bits:
'systems': [
{
'name': 'arcus',
'hostnames': ['eb-login-0'],
'modules_system': 'lmod',
'partitions':[
{
'name':'ib-gcc9-openmpi4-ucx',
'scheduler': 'slurm',
'access': [ '--partition=test'],
'launcher':'srun',
'environs': ['imb'],
'modules': ['gcc/9.2.0-3j3swca', 'openmpi/4.0.3-dxa6sov'],
'variables': [
['SLURM_MPI_TYPE', 'pmix_v2'],
]
},
{
'name':'ib-gcc9-impi2019-mlx',
'scheduler': 'slurm',
'launcher':'mpirun',
'access': [ '--partition=test'],
'environs': ['imb'],
'modules': ['gcc/9.2.0-3j3swca', 'intel-mpi/2019.8.254-5qpjevf'],
'variables': [
['FI_PROVIDER', 'mlx'],
],
},
]
},
],
'environments': [
# {
# 'name': 'imb', # a non-targeted environment seems to be necessary for reframe to load the config
# },
{
'name': 'imb',
'target_systems': ['arcus:ib-gcc9-openmpi4-ucx', 'arcus:roce-gcc9-openmpi4-ucx'],
'modules': ['intel-mpi-benchmarks/2019.6-42qobhq'],
},
{
'name': 'imb',
'target_systems': ['arcus:ib-gcc9-impi2019-mlx', 'arcus:roce-gcc9-impi2019-mlx'],
'modules': ['intel-mpi-benchmarks/2019.6-sl772ml'],
},
],
Run and error:
(hpc-tests) [centos@eb-login-0 hpc-tests]$ reframe/bin/reframe -C rfm_config_simple.py -c apps/imb/ --run --performance-report --tag pingpong
reframe/bin/reframe: failed to load configuration: section 'environments' not defined for system 'arcus'
If I remove the comments on the empty imb
environ then it runs both as expected:
- arcus:ib-gcc9-openmpi4-ucx
- imb
* num_tasks: 2
* max_bandwidth: 11108.47 Mbytes/sec
* min_latency: 0.96 t[usec]
- arcus:ib-gcc9-impi2019-mlx
- imb
* num_tasks: 2
* max_bandwidth: 11184.88 Mbytes/sec
* min_latency: 1.02 t[usec]
I could reproduce this with with even a single environment:
'systems': [
{
'name': 'tresa',
'hostnames': ['.*'],
'partitions': [
{
'name': 'default',
'scheduler': 'local',
'launcher': 'local',
'environs': ['builtin'],
'container_platforms': [{'type': 'Docker'}],
'max_jobs': 8
}
]
},
],
'environments': [
{
'name': 'builtin',
'cc': 'cc',
'cxx': '',
'ftn': '',
'target_systems': ['tresa:default']
},
],
./bin/reframe -C config/tresa.py -l
./bin/reframe: failed to load configuration: section 'environments' not defined for system 'tresa'
Although I suspect why this is happening, this behaviour is not correct. I mark it as a bug.
I could reproduce this with with even a single environment:
'systems': [ { 'name': 'tresa', 'hostnames': ['.*'], 'partitions': [ { 'name': 'default', 'scheduler': 'local', 'launcher': 'local', 'environs': ['builtin'], 'container_platforms': [{'type': 'Docker'}], 'max_jobs': 8 } ] }, ], 'environments': [ { 'name': 'builtin', 'cc': 'cc', 'cxx': '', 'ftn': '', 'target_systems': ['tresa:local'] }, ],
./bin/reframe -C config/tresa.py -l
./bin/reframe: failed to load configuration: section 'environments' not defined for system 'tresa'
Although I suspect why this is happening, this behaviour is not correct. I mark it as a bug.
If you also add a bare tresa
in the target_systems
of the builtin environment the configuration load succeeds. The problem seems to be on:
https://github.com/eth-cscs/reframe/blob/661dbc170cce67f416a7d6923bcc2e941bc26d35/reframe/core/config.py#L342
Where it searches for a bare tresa
according to the fullname, it cannot find it and therefore the environments
part is not populated.
@teojgo I will get back to you shortly about what's the logic behind this behaviour. It explains both why it works with tresa
only or with *
, the default for target_systems
.
The logic behind this is that ReFrame when loading the configuration it calls select_subconfig(current_system)
to set itself up for the current system, so essentially "instantiates" the configuration file for the current system and then validates it. When instantiates the configuration, it will try to find definitions for all the scoped keys inside the current scope, i.e., the current system. Therefore, it can't see what's defined inside a nested scope, as for example for each specific partition. That's why it works when target_systems
is set to tresa
in this example or it is left default. For the same reason, it works if --system=tresa:default
is passed. Whatever we do to fix this, we should be careful with the logic behind it. Later on, ReFrame calls again select_subconfig()
for each of the system partitions in order to get the partition-specific definitions. So if that step only existed, the problem would be solved, but I don't know if that solution is feasible.
@vkarak would it make sense to have a default environment attribute in the configuration for a system?
I'd note it is preferable (to me!) if a failure is generated if you have accidentally listed an environment under a systems' environs parameter, but then not actually defined that environment for that system. I'm not sure whether having a "default" environment would break that behaviour and silently run with that default environment definition instead.
http://stackhpc.com/ Please note I work Tuesday to Friday.
On Mon, 5 Oct 2020 at 14:00, Theofilos Manitaras notifications@github.com wrote:
@vkarak https://github.com/vkarak would it make sense to have a default environment attribute in the configuration for a system?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/eth-cscs/reframe/issues/1453#issuecomment-703615792, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH65TXT72WRQMZGZDMICJ4TSJG7QBANCNFSM4P6CUU2A .
At the moment, reframe seems to require a default environment. If I do something like this:
Then I get a message like "environment 'openfoam' not defined for system 'mysys'" (paraphrased, lost the original terminal).
I have to add an empty default environment of the same name for it to work:
Which is fine, because I can use the 'environs' values in the partition to restrict tests to valid partition+environ combinations. Except that it is an error trap - for example if I misspelt the 'target_systems' above, then reframe thinks there is a valid combination because of the default one, then I get a cryptic error during module loads.
To me that case with no default seems quite reasonable/normal, but maybe it's not.