rancher / qa-tasks

List of QA Backlog
1 stars 1 forks source link

Baseline scenario for performance of Rancher #1057

Open git-ival opened 11 months ago

git-ival commented 11 months ago

Large sub-tasks:

Scenarios:

  1. Rancher managing no downstream Kubernetes cluster and a single simulated user using the UI

    • Should measure around 0.5 CPU and 6GB Memory usage
  2. Rancher managing 5 downstream Kubernetes clusters with 10 worker nodes each and 5 parallel simulated users using the UI

    • Measure while making 700 requests per second
    • Should measure around 4 CPU and 13GB Memory usage
git-ival commented 6 months ago
git-ival commented 6 months ago

Realistically we can use dartboard for Scenario 1, and more generally for collection of metrics/data from rancher-monitoring or Prometheus directly. This will take some implementation time depending on if additional metrics beyond those currently supported are desired.

In general, we will likely need to rely on k6 to simulate load and user activity as well as to collect metrics during that load. There will be some learning curve around k6 as Scenario 2 will rely on it heavily. Designing the users' simulated workflows will be the key challenge and could reach a very high level of complexity. As a baseline we can outline a "simple" workflow that will focus on lists/pagination across some # of downstreams per user.

As a baseline, we can assume a Rancher configuration of 4 nodes (3 all-roles, 1 worker-only for rancher-monitoring), RKE1, AWS, Rancher v2.8-latest.

This effort is primarily focused on raw # of requests per second, so other benchmark testing is not up for consideration here. We will target more specific types of requests and related metrics as part of future efforts.

git-ival commented 6 months ago

As part of our Baseline environments, we should force a number of clusters to be "disconnected". This will take some implementation work, but should be feasible