reframe-hpc / reframe

A powerful Python framework for writing and running portable regression tests and benchmarks for HPC systems.
https://reframe-hpc.readthedocs.org
BSD 3-Clause "New" or "Revised" License
213 stars 98 forks source link

[feat] Support performance results comparisons #3227

Open vkarak opened 1 week ago

vkarak commented 1 week ago

This PR enables performance result comparison natively in ReFrame.

New functionality

The --performance-report is extended to accept an optional specifier that controls how the performance results of the current run will be compared to past results.

A new --performance-compare option is provided that allows ReFrame to compare the performance results of two past periods of time.

The syntax of both options is very similar:

--performance-report=<target_period>/<aggr_spec>/<extra_cols>
--performance-compare=<base_period>/<target_period>/<aggr_spec>/<extra_cols>

Each time period is specified in the following simplified syntax:

<time_period> ::= ("now" | <timestamp>) ":" ("now" | <timestamp>) (("+" | "-") [0-9]+ <unit>)?
<timestamp> ::= ([a-z] | [A-Z] | [0-9])+
<unit> ::= "h" | "m" | "s" | "d"

Allowed timestamps must follow any of the formats below:

%Y%m%d
%Y%m%dT%H%M
%Y%m%dT%H%M%S
%Y%m%dT%H%M%S%z

A time period can also be specified implicitly through a session UUID. The session UUID is essentially the start timestamp of the run session in the format %Y%m%dT%H%M%S%z and is always printed at the end of the run session. If you want to use a session UUID as a timestamp you should specify it as ^<session_uuid>.

An aggregation specifier follows the syntax:

<aggr_spec> ::= <aggr_op> ":" <extra_group_by>
<aggr_op> ::= "first" | "last" | "mean" | "median" | "min" | "max"
<extra_group_by> ::= ("+" <column_name>)*
<column_name> ::= ([a-z] | [A-Z] | [0-9])+

Finally, the extra columns can be specified with the following syntax:

<extra_columns> ::= ("+" <column_name>)*
<column_name> ::= ([a-z] | [A-Z] | [0-9])+

Note that the above EBNF forms are simplified in terms of the accepted column names and/or timestamps. Note also that the column names accepted are the the same as the log format specifiers but without the the check_ prefix.

Results are always aggregated by test name, performance variable name and performance variable unit and this cannot be changed. User can only provide additional columns to group.

Similarly, the output will contain all the group by columns plus any additional columns specified by the user. For non-aggregated columns, the unique values across all samples will be combined as a single string using the | separator.

Examples

--performance=now:now-7d/mean:+system+partition+environment/+result
--performance-compare=^20240630T163456/^20240701T155412/median:+job_nodelist/+result

Key design concepts

Upon finishing running the tests and after generating the JSON run report, ReFrame stores the run results in a database. The full JSON report blob is stored and the test cases of the run are indexed by their job_completion_time_unix.

Querying results is purposefully kept simple and is fulfilled by a single API call to fetch all the test cases that have run in a certain period of time. Even querying by a session UUID goes through this path, by first obtaining the time span of the session.

The obtained test cases are then grouped by the specified attributes and their performance values are aggregated using the aggregation function.

Finally, a 2D array with the table data is returned to the caller.

Implementation details

The database file is placed inside the report file folder and is named results.db and it cannot be changed.

Also the report format is changed thus we bump its data version to 4.0. The check_vars and check_params are flattened and are stored inside the testcase.

Code organization

The existing reporting code is considerably refactored and obviously new code is added. Here is a summary of the structure:

Finally, this PR fixes some coding style issues to make flake8 happy again.

Todos

Open issues to discuss

Closes #2733.

pep8speaks commented 1 week ago

Hello @vkarak, Thank you for updating!

Cheers! There are no PEP8 issues in this Pull Request!Do see the ReFrame Coding Style Guide

Comment last updated at 2024-07-02 08:19:14 UTC
ekouts commented 5 days ago

Still testing the PR and going through it. Just some preliminary comment regarding the open issues for discussion:

  • Add the option to not store the json reports at all as now these are stored in the database? This would require us to add an option to list and show old reports from the command line.
  • We likely need an option to list quickly the stored sessions by UUID so as to be able to easily select two sessions.

I think both are good ideas. The second one in particular is useful to not go through the report manually to find it.

I would like to add some more items to discuss:

vkarak commented 7 hours ago

As far as I can tell, testcases that are generated by --distribute are treated as different tests. I think it would be good to be able to aggregate the performance results of testcases that are run in different nodes (but same partitions).

I think the --distribute option has two problems:

I think the solution to this naming problem should probably be independent to how we just treat it in the results DB, but I agree that it needs fixing in this context.

Would it make sense to give the options to users to remove certain UUID from the database? If you have some "invalid" results then the only options is to remove the whole database now or do it outside of reframe. If we don't want to support it through reframe, maybe we can have a small explanation of how to manually "change" the db?

I think it makes sense in general.

Not sure how we would properly address this but let's say you are comparing two runs and using max as aggregator. What would be the way to find the uuid of the session? Currently you have to manually look in the report, right?

If you add +session_uuid to the extra columns argument, it should show all the unique session UUIDs that the aggregation came from. The same happens for any column that you select. I have tried this with test attributes but not with the session UUID, so if it doesn't work, I should have a look at it.

In the performance report now you see only one entry for each test (in case of retries). This is a different behaviour than the past one and I think some people may prefer to see the individual results.

You should still be getting now different entries as each repeated test has a different name, which is something we want to change though. But in any case I don't believe we should offer this sort of customization for the performance report at the moment given that the raw performance results are available in the json reports, in the database and the perflogs.

Nonetheless, I believe you can still get the same if you added +pval to the extra columns (haven't tried it).

Maybe we could get this by grouping also by +run_index (again we should try that as I haven't :-) )

We are not printing anymore the title

I have fixed that but didn't push yet :-)

I tried running two reframe runs in parallel and the database can get silently corrupted. Maybe not an immediate issue, but good to keep in mind.

Interesting will have a 👀 I thought that sqlite3 was handling multiple processes... Will try to reproduce.