Closed vkarak closed 9 months ago
The problem is that the test's standard output/error files are passed as options to the srun command, thus overriding the output of the whole script. Here's how to reproduce:
srun
Configuration file (you can add the access options accordingly if needed):
access
site_configuration = { 'systems': [ { 'name': 'system', 'hostnames': ['nid0'], 'partitions': [ { 'name': 'part', 'scheduler': 'local', 'launcher': 'srunalloc', 'environs': ['builtin'] } ] } ] }
And the test file:
import reframe as rfm import reframe.utility.sanity as sn @rfm.simple_test class srunalloc_fail_test(rfm.RunOnlyRegressionTest): executable = 'hostname' prerun_cmds = ['echo hello'] valid_systems = ['system:part'] valid_prog_environs = ['*'] @sanity_function def validate(self): return sn.assert_found('hello', self.stdout)
Running the test fails as follows:
SUMMARY OF FAILURES ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- FAILURE INFO for srunalloc_fail_test (run: 1/1) * Description: * System partition: system:part * Environment: builtin * Stage directory: /home/user/reframe/stage/system/part/builtin/srunalloc_fail_test * Node list: nid0001 * Job type: local (id=83006) * Dependencies (conceptual): [] * Dependencies (actual): [] * Maintainers: [] * Failing phase: sanity * Rerun with '-n /b359e5de -p builtin --system system:part -r' * Reason: sanity error: pattern 'hello' not found in 'rfm_job.out' --- rfm_job.out (first 10 lines) --- nid0001 --- rfm_job.out --- --- rfm_job.err (first 10 lines) --- --- rfm_job.err ---
Removing the --output and --error srun options here solves the issue:
--output
--error
https://github.com/reframe-hpc/reframe/blob/a3366b6c9ab7567df295fc9f30bae13fd5fa7dfc/reframe/core/launchers/mpi.py#L129-L133
The problem is that the test's standard output/error files are passed as options to the
srun
command, thus overriding the output of the whole script. Here's how to reproduce:Configuration file (you can add the
access
options accordingly if needed):And the test file:
Running the test fails as follows:
Removing the
--output
and--error
srun options here solves the issue:https://github.com/reframe-hpc/reframe/blob/a3366b6c9ab7567df295fc9f30bae13fd5fa7dfc/reframe/core/launchers/mpi.py#L129-L133