Open septract opened 4 months ago
I talked with my colleague Kevin Quick about how this should work. Here's one possible approach:
check.sh
script to output per-example run times in a .csv
file .csv
files through the artifact API Sounds good to me. For simplicity I might skip the aggregation - just keep the commit timestamp in the csv and ask gnuplot to regen the graph on each commit (with x axis either commit index or time).
Github artifacts are deleted after 90 days, so it depends if we care about long term retention of performance logs. If we do, we'll need some other approach to aggregation and storage.
Oh, maybe you mean we just keep a rolling csv artifact with all the previous logs? That could work.
Ideally I think we'd end up with all the logs and the current graphs checked in someplace rather than relying on that transient thing - but I don't speak this much github to have an opinion how.
On Fri, 26 Jul 2024 at 06:35, Mike Dodds @.***> wrote:
Oh, maybe you mean we just keep a rolling csv artifact with all the previous logs? That could work.
— Reply to this email directly, view it on GitHub https://github.com/rems-project/cn-tutorial/issues/54#issuecomment-2252007942, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFMZZQ2B4OFSHDAMKDEVX3ZOHN3RAVCNFSM6AAAAABLHFR7JOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJSGAYDOOJUGI . You are receiving this because you commented.Message ID: @.***>
I talked with my colleague Kevin Quick about how this should work. Here's one possible approach:
For generating logs in a single CI run:
- Modify the
check.sh
script to output per-example run times in a.csv
file- Save this file as an artifact
For detecting regressions in a PR:
- Generate the current performance numbers.
- Pull the most recent N
.csv
files through the artifact API- Compute a rolling average / trajectory / whatever we need to identify a regression
- If there has been a regression, apply a label
- Do whatever else we want in order to raise the saliency of the regression - auto-comment on the PR, block the merge ...
For long term logging
- Run a nightly CI job that pulls all the artifacts from the current day
- Aggregate into a daily log file
- Generate a performance graph using GnuPlot
- Commit the log and performance graph to the repo
That sounds good to me.
And it would indeed be very useful to keep long-term data. However we implement 2 and 3 exactly, I think it's good to have the data from item 3 committed in the repository, to reduce the risk of losing that.
I've opened a PR that runs benchmarks on every update to master.
Next I'll work on running benchmarks on every PR so that we can detect potential regressions. Ideally this will add a comment with graphs of the updated performance numbers. If we can't get that working quickly, the alternative is to have the (optional) benchmarking CI job fail and add a comment if performance degradation exceeds a threshold (2x-3x slower?).
For the future, it would also be nice to compare CVC5 vs z3 timings.
Per @cp526 in https://github.com/rems-project/cerberus/pull/365#issuecomment-2232911423