mmtk / mmtk-core

Memory Management ToolKit
https://www.mmtk.io
Other
378 stars 69 forks source link

Performance/stress CI rework #869

Open qinsoon opened 1 year ago

qinsoon commented 1 year ago

This issue discusses what we need and what we are going to do for performance regression CI and stress test CI. They share infrastructure so I put them together.

Requirements

Non goal

Design

Job Triggering

Job execution

Results storage

Visualization

qinsoon commented 1 year ago

Related issues:

k-sareen commented 1 year ago

This is what Rust uses: https://github.com/rust-lang/rustc-perf

If we want to use their frontend, we would have to output results in a compatible format. I am not sure what that format is.

qinsoon commented 1 year ago

This is what Rust uses: https://github.com/rust-lang/rustc-perf

If we want to use their frontend, we would have to output results in a compatible format. I am not sure what that format is.

I noticed that. But the project seems tightly coupled with rustc and not suitable for us.

caizixian commented 1 year ago

Here's an architecture I discussed with @tianleq and @wenyuzhao

We build a lightweight API server backed by some sort of database (Firebase, or SQL on VPS, etc.). We just need one table with columns (commit metadata, date, metric, benchmark, configuration, data).

We expose two very generic HTTP endpoints.

POST /query

Body: {metric: str, benchmarks: [str], configurations: [str], repo: Option[str], pr: Option[str], branch: Option[str], commits: Option[[str]]}

POST /insert

Body: {metric: str, benchmark: str, configuration: str, repo: str, pr: Option[str], branch: Option[str], commit: str}

These endpoints return an array of datapoints. These endpoints should be easy to implement with INSERT and SELECT

During benchmark runs, for each completed configuration/benchmark, we do POSTs to insert parsed data into the database, and then do another POST to store the log in object storage.

The visualization frontend can just be a static webpage that talks to the backend. We can also have other text-based frontends (such as GitHub bot) that comment on PRs.

Some example HTTP requests.

Performance regression for the same configuration on multiple benchmarks: {metric: "total_time", benchmarks: [fop, lusearch]}, configurations: [OpenJDK_SemiSpace], repo: mmtk/mmtk-core, pr: None, branch: master, commit: None}

Performance comparison before merging PR: {metric: "total_time", benchmarks: [fop, lusearch]}, configurations: [OpenJDK_SemiSpace], repo: mmtk/mmtk-core, pr: 42, commit: None}

Get performance for a single commit: {metric: "total_time", benchmarks: [fop, lusearch]}, configurations: [OpenJDK_SemiSpace], repo: mmtk/mmtk-core, commit: deadbeef}

Performance comparison against baseline: {metric: "total_time", benchmarks: [fop]}, configurations: [OpenJDK_SemiSpace, OpenJDK_Parallel], repo: mmtk/mmtk-core, pr: None, branch: master, commit: None}

qinsoon commented 1 year ago

That looks like what codespeed does. Should we use codespeed rather than reinventing the wheel?

caizixian commented 1 year ago

That looks like what codespeed does. Should we use codespeed rather than reinventing the wheel?

Main problems are

  1. Codespeed only supports numerical data. We might want to store histogram from bpftrace, etc.
  2. Codespeed's timeline view only support viewing different benchmarks in different graphs, and compare different executable in the same graph. We want to support viewing different benchmarks of the same executable in the same graph so that we can see how the performance trend differs depending on the workload
  3. Unclear how to support multiple invocations and errorbar
caizixian commented 1 year ago

Also it seems like the API I proposed above is too narrow. We probably need something plotty-esque. Essentially, we need four generic fields: run, scenario, value.

We assume that the database backend will just need to perform filtering and retrieval, and the analysis logic will be implemented on the client side. It seems like document db like MongoDB/Elasticsearch can be a good choice for such unstructured data.

We mostly interested in two types of query. Compare two runs, or track the trend of specific scenarios over time. So we need some sort of indices on run, scenario, and metric.

Client-side analysis and visualization should be feasible given today's web stack and machine performance.

This might eventually replace plotty, so that we can share the same workflow for performance regression and day-to-day analysis.

It might be possible to do a lot of analysis and a dashboard in, e.g. Kibana (the normalization algorithm used by plotty is really hard to implement as database queries). https://www.elastic.co/guide/en/kibana/current/lens.html

qinsoon commented 1 year ago

Zixian mentioned this blog post https://www.mongodb.com/blog/post/using-change-point-detection-find-performance-regressions. The post itself does not include much information, but there is a list of papers and talks at the end.