Thermal hydraulics modules/applications performance benchmarking

joshuahansel commented 1 year ago

Reason

Thermal hydraulics applications need benchmarks to assess progress. Note that this is related to #21133, which concerns V&V, but this issue is focused on performance benchmarking, such as timing.

Design

This would enable/implement testing for MOOSE-based thermal hydraulics modules (THM, NS) and applications.

Impact

Improved testing.

joshuahansel commented 1 year ago

The first task will be to determine what the desired metrics are. For just timing, the existing MOOSE benchmarking ability can be leveraged as-is: https://mooseframework.inl.gov/application_development/performance_benchmarking.html. If we want more, such as memory and/or iteration counts, there will probably need to be quite a bit of development.

The relevant CIVET recipe is recipes/moosebuild/moose/branch_next/Timings.cfg, which runs scripts/app_test_timing.sh. This script runs

./run_tests --run speedtests

(see https://mooseframework.inl.gov/application_development/performance_benchmarking.html). Then the new timing data is stored in the existing database. Finally, the data is processed to generate plots and an HTML page, which is uploaded to a server.

Note that there is a perflog flag in SpeedTest (python/TestHarness/testers/bench.py), but there not appear to be any example in MOOSE, so this should be tested. It's recommended to store perflogs if possible, since when total time changes are observed, we can determine which sections these speedups/slowdowns are actually occurring.

joshuahansel commented 1 year ago

I confirmed that the perflog option does not currently work: https://github.com/idaholab/moose/issues/23721.

joshuahansel commented 1 year ago

I don't think the perflog capability is super important yet. I think we should just start with the gross timing, and down the line if we find ourselves wanting the perflog capability, we can revisit it. We should be watching the gross timing changes from week to week and identifying any major timing increases as we go. We can always generate perlogs when we're investigating a week-to-week timing change - we don't need to store the history of perflogs. I think it might be information overload and maybe not useful, since there's bound to be complications with comparisons (different call stacks, input file changes, etc.) that it will hard to track cumulatively.

I think we should start with gross timing and use that as our guide. When we encounter a significant timing difference and are trying to determine why, we should compare iteration counts first. As far as memory, I don't know how important it is to track.

idaholab / moose