Open qinsoon opened 1 year ago
This is what Rust uses: https://github.com/rust-lang/rustc-perf
If we want to use their frontend, we would have to output results in a compatible format. I am not sure what that format is.
This is what Rust uses: https://github.com/rust-lang/rustc-perf
If we want to use their frontend, we would have to output results in a compatible format. I am not sure what that format is.
I noticed that. But the project seems tightly coupled with rustc
and not suitable for us.
Here's an architecture I discussed with @tianleq and @wenyuzhao
We build a lightweight API server backed by some sort of database (Firebase, or SQL on VPS, etc.). We just need one table with columns (commit metadata, date, metric, benchmark, configuration, data).
{commit: deadbeef, repo: mmtk/mmtk-core, pr: 42}
or {commit: deadbeef, repo: mmtk/mmtk-core, branch: master}
.We expose two very generic HTTP endpoints.
POST /query
Body: {metric: str, benchmarks: [str], configurations: [str], repo: Option[str], pr: Option[str], branch: Option[str], commits: Option[[str]]}
POST /insert
Body: {metric: str, benchmark: str, configuration: str, repo: str, pr: Option[str], branch: Option[str], commit: str}
These endpoints return an array of datapoints. These endpoints should be easy to implement with INSERT
and SELECT
During benchmark runs, for each completed configuration/benchmark, we do POSTs to insert parsed data into the database, and then do another POST to store the log in object storage.
The visualization frontend can just be a static webpage that talks to the backend. We can also have other text-based frontends (such as GitHub bot) that comment on PRs.
Some example HTTP requests.
Performance regression for the same configuration on multiple benchmarks: {metric: "total_time", benchmarks: [fop, lusearch]}, configurations: [OpenJDK_SemiSpace], repo: mmtk/mmtk-core, pr: None, branch: master, commit: None}
Performance comparison before merging PR: {metric: "total_time", benchmarks: [fop, lusearch]}, configurations: [OpenJDK_SemiSpace], repo: mmtk/mmtk-core, pr: 42, commit: None}
Get performance for a single commit: {metric: "total_time", benchmarks: [fop, lusearch]}, configurations: [OpenJDK_SemiSpace], repo: mmtk/mmtk-core, commit: deadbeef}
Performance comparison against baseline: {metric: "total_time", benchmarks: [fop]}, configurations: [OpenJDK_SemiSpace, OpenJDK_Parallel], repo: mmtk/mmtk-core, pr: None, branch: master, commit: None}
That looks like what codespeed does. Should we use codespeed rather than reinventing the wheel?
That looks like what codespeed does. Should we use codespeed rather than reinventing the wheel?
Main problems are
Also it seems like the API I proposed above is too narrow. We probably need something plotty-esque.
Essentially, we need four generic fields: run, scenario, value
.
run
(dict
) can be used to locate results. A run must contain a unique ID (UUID or hostname-timestamp
) and the timestamp, and a bunch of additional fields (for example, PR number, repo name, etc.)scenario
(dict
): benchmark, runtime, GC, arguments, etc.metric
(str
): rss, startup time, total time, GC time, allocation slow path distribution.value
(Any
): data.We assume that the database backend will just need to perform filtering and retrieval, and the analysis logic will be implemented on the client side. It seems like document db like MongoDB/Elasticsearch can be a good choice for such unstructured data.
We mostly interested in two types of query. Compare two runs, or track the trend of specific scenarios over time. So we need some sort of indices on run, scenario, and metric.
Client-side analysis and visualization should be feasible given today's web stack and machine performance.
This might eventually replace plotty, so that we can share the same workflow for performance regression and day-to-day analysis.
It might be possible to do a lot of analysis and a dashboard in, e.g. Kibana (the normalization algorithm used by plotty is really hard to implement as database queries). https://www.elastic.co/guide/en/kibana/current/lens.html
Zixian mentioned this blog post https://www.mongodb.com/blog/post/using-change-point-detection-find-performance-regressions. The post itself does not include much information, but there is a list of papers and talks at the end.
This issue discusses what we need and what we are going to do for performance regression CI and stress test CI. They share infrastructure so I put them together.
Requirements
Non goal
Design
Job Triggering
Job execution
Results storage
Visualization