[feat] Support performance results comparisons

vkarak commented 1 week ago

This PR enables performance result comparison natively in ReFrame.

New functionality

The --performance-report is extended to accept an optional specifier that controls how the performance results of the current run will be compared to past results.

A new --performance-compare option is provided that allows ReFrame to compare the performance results of two past periods of time.

The syntax of both options is very similar:

--performance-report=<target_period>/<aggr_spec>/<extra_cols>
--performance-compare=<base_period>/<target_period>/<aggr_spec>/<extra_cols>

Each time period is specified in the following simplified syntax:

<time_period> ::= ("now" | <timestamp>) ":" ("now" | <timestamp>) (("+" | "-") [0-9]+ <unit>)?
<timestamp> ::= ([a-z] | [A-Z] | [0-9])+
<unit> ::= "h" | "m" | "s" | "d"

Allowed timestamps must follow any of the formats below:

%Y%m%d
%Y%m%dT%H%M
%Y%m%dT%H%M%S
%Y%m%dT%H%M%S%z

A time period can also be specified implicitly through a session UUID. The session UUID is essentially the start timestamp of the run session in the format %Y%m%dT%H%M%S%z and is always printed at the end of the run session. If you want to use a session UUID as a timestamp you should specify it as ^<session_uuid>.

An aggregation specifier follows the syntax:

<aggr_spec> ::= <aggr_op> ":" <extra_group_by>
<aggr_op> ::= "first" | "last" | "mean" | "median" | "min" | "max"
<extra_group_by> ::= ("+" <column_name>)*
<column_name> ::= ([a-z] | [A-Z] | [0-9])+

Finally, the extra columns can be specified with the following syntax:

<extra_columns> ::= ("+" <column_name>)*
<column_name> ::= ([a-z] | [A-Z] | [0-9])+

Note that the above EBNF forms are simplified in terms of the accepted column names and/or timestamps. Note also that the column names accepted are the the same as the log format specifiers but without the the check_ prefix.

Results are always aggregated by test name, performance variable name and performance variable unit and this cannot be changed. User can only provide additional columns to group.

Similarly, the output will contain all the group by columns plus any additional columns specified by the user. For non-aggregated columns, the unique values across all samples will be combined as a single string using the | separator.

Examples

Compare the current run with the mean performance of the last 7 days grouping also by system partition and environment and listing also the test result.

--performance=now:now-7d/mean:+system+partition+environment/+result

Compare the median performance of two session runs and group by results by nodelist and print also the test result

--performance-compare=^20240630T163456/^20240701T155412/median:+job_nodelist/+result

Key design concepts

Upon finishing running the tests and after generating the JSON run report, ReFrame stores the run results in a database. The full JSON report blob is stored and the test cases of the run are indexed by their job_completion_time_unix.

Querying results is purposefully kept simple and is fulfilled by a single API call to fetch all the test cases that have run in a certain period of time. Even querying by a session UUID goes through this path, by first obtaining the time span of the session.

The obtained test cases are then grouped by the specified attributes and their performance values are aggregated using the aggregation function.

Finally, a 2D array with the table data is returned to the caller.

Implementation details

The database file is placed inside the report file folder and is named results.db and it cannot be changed.

Also the report format is changed thus we bump its data version to 4.0. The check_vars and check_params are flattened and are stored inside the testcase.

Code organization

The existing reporting code is considerably refactored and obviously new code is added. Here is a summary of the structure:

reframe.frontend.statistics is removed and the TestStats is deprived of any reporting functionality. Its definition is moved to reframe.frontend.executors and it's only purpose is to hold statistics of the run as ReFrame runs the tests.
reframe.frontend.printer gets all the responsibility for formatting and presenting the various reports (failure report, retry report, performance report etc.)
reframe.frontend.reporting holds everything related to generating a run report and restoring a session from a run report. It also contains all the analytics functionality described above.
reframe.frontend.reporting.storage provides an abstraction to the backend storage for storing run reports. It essentially provides the basic API for querying the test cases that have run in a certain period of time.

Finally, this PR fixes some coding style issues to make flake8 happy again.

Todos

[ ] Add unit tests
[ ] Better handling of user errors when invalid comparison spec is passed (currently, a stack trace is printed)
[ ] Add docs and adapt the tutorial

Open issues to discuss

Add the option to not store the json reports at all as now these are stored in the database? This would require us to add an option to list and show old reports from the command line.
We likely need an option to list quickly the stored sessions by UUID so as to be able to easily select two sessions.

Closes #2733.

pep8speaks commented 1 week ago

Hello @vkarak, Thank you for updating!

Cheers! There are no PEP8 issues in this Pull Request!Do see the ReFrame Coding Style Guide

Comment last updated at 2024-07-02 08:19:14 UTC

ekouts commented 5 days ago

Still testing the PR and going through it. Just some preliminary comment regarding the open issues for discussion:

Add the option to not store the json reports at all as now these are stored in the database? This would require us to add an option to list and show old reports from the command line.

We likely need an option to list quickly the stored sessions by UUID so as to be able to easily select two sessions.

I think both are good ideas. The second one in particular is useful to not go through the report manually to find it.

I would like to add some more items to discuss:

As far as I can tell, testcases that are generated by --distribute are treated as different tests. I think it would be good to be able to aggregate the performance results of testcases that are run in different nodes (but same partitions).

Results are always aggregated by test name, performance variable name and performance variable unit and this cannot be changed.

Since the variable has a "special" name it shouldn't be too hard to be able to ignore this, right?
Would it make sense to give the options to users to remove certain UUID from the database? If you have some "invalid" results then the only options is to remove the whole database now or do it outside of reframe. If we don't want to support it through reframe, maybe we can have a small explanation of how to manually "change" the db?

vkarak commented 7 hours ago

As far as I can tell, testcases that are generated by --distribute are treated as different tests. I think it would be good to be able to aggregate the performance results of testcases that are run in different nodes (but same partitions).

I think the --distribute option has two problems:

The parameters added added by the test generation mechanism alter the test name; the same problem is shared with all other test generation options.
For the --distribute option we also modify the base test name adding the partition name to it, which also makes it different from the same test that has run on the same partition and same node.

I think the solution to this naming problem should probably be independent to how we just treat it in the results DB, but I agree that it needs fixing in this context.

Would it make sense to give the options to users to remove certain UUID from the database? If you have some "invalid" results then the only options is to remove the whole database now or do it outside of reframe. If we don't want to support it through reframe, maybe we can have a small explanation of how to manually "change" the db?

I think it makes sense in general.

Not sure how we would properly address this but let's say you are comparing two runs and using max as aggregator. What would be the way to find the uuid of the session? Currently you have to manually look in the report, right?

If you add +session_uuid to the extra columns argument, it should show all the unique session UUIDs that the aggregation came from. The same happens for any column that you select. I have tried this with test attributes but not with the session UUID, so if it doesn't work, I should have a look at it.

In the performance report now you see only one entry for each test (in case of retries). This is a different behaviour than the past one and I think some people may prefer to see the individual results.

You should still be getting now different entries as each repeated test has a different name, which is something we want to change though. But in any case I don't believe we should offer this sort of customization for the performance report at the moment given that the raw performance results are available in the json reports, in the database and the perflogs.

~~Nonetheless, I believe you can still get the same if you added +pval to the extra columns (haven't tried it).~~

Maybe we could get this by grouping also by +run_index (again we should try that as I haven't :-) )

We are not printing anymore the title

I have fixed that but didn't push yet :-)

I tried running two reframe runs in parallel and the database can get silently corrupted. Maybe not an immediate issue, but good to keep in mind.

Interesting will have a 👀 I thought that sqlite3 was handling multiple processes... Will try to reproduce.

reframe-hpc / reframe