reframe-hpc / reframe

A powerful Python framework for writing and running portable regression tests and benchmarks for HPC systems.
https://reframe-hpc.readthedocs.org
BSD 3-Clause "New" or "Revised" License
213 stars 98 forks source link

If a performance test fails in sanity or earlier no perflog entry is created #3186

Closed vkarak closed 1 month ago

vkarak commented 2 months ago

Performance logging happens after the test finishes since version 4.0 and if a test is a performance test. We need to understand the exact conditions when this happens and what triggers this behaviour.

This is also related to #2853.

vkarak commented 2 months ago

The problem is that although performance logging happens on test task finish (success or failure), the performance logger is set up only during the performance stage, thus if a test fails before it will log the performance to the null logger, which does nothing.

https://github.com/reframe-hpc/reframe/blob/09ae0293001696656ec548dc0fcc1378a20bc7cf/reframe/frontend/executors/__init__.py#L379

vkarak commented 2 months ago

Fixing this is a bit tricky. Moving the assignment of the performance logger in an earlier stage is not a solution, although it produces a log record. The problem here is that the check_perfvalues placeholder is empty, so nothing is being logged regarding performance values. This is not bad per se, but since the perflog handler does not know the actual performance variables, it creates a new log file, where it dumps the entry for the failed test instead of appending to the existing file (the header of the perflog file has changed). We could move the generation of check_perfvalues (the perfvalues attribute in the test) in an earlier stage but this is not sufficient either, as many times performance variables are set during the performance stage, which will not be executed at all if the test fails in a previous stage.

Maybe the best solution would be to continue the test in dry-run mode once it has failed a stage so that the performance stage gets executed up to the point of evaluating the performance variables.

vkarak commented 1 month ago

I think this is not a bug, but rather a limitation of the current implementation that should be documented. Therefore I mark this issue as an "enhancement".