Open git-ival opened 11 months ago
Realistically we can use dartboard
for Scenario 1, and more generally for collection of metrics/data from rancher-monitoring
or Prometheus directly. This will take some implementation time depending on if additional metrics beyond those currently supported are desired.
In general, we will likely need to rely on k6
to simulate load and user activity as well as to collect metrics during that load.
There will be some learning curve around k6
as Scenario 2 will rely on it heavily.
Designing the users' simulated workflows will be the key challenge and could reach a very high level of complexity. As a baseline we can outline a "simple" workflow that will focus on lists/pagination across some # of downstreams per user.
As a baseline, we can assume a Rancher configuration of 4 nodes (3 all-roles, 1 worker-only for rancher-monitoring), RKE1, AWS, Rancher v2.8-latest.
This effort is primarily focused on raw # of requests per second, so other benchmark testing is not up for consideration here. We will target more specific types of requests and related metrics as part of future efforts.
As part of our Baseline environments, we should force a number of clusters to be "disconnected". This will take some implementation work, but should be feasible
Large sub-tasks:
cattle-cluster-agent
deployment of a given downstream cluster.CATTLE_AGENT_IMAGE
stringCATTLE_AGENT_IMAGE
env var on the Rancher deployment in order to ensure that it will use the desired image + repoScenarios:
Rancher managing no downstream Kubernetes cluster and a single simulated user using the UI
Rancher managing 5 downstream Kubernetes clusters with 10 worker nodes each and 5 parallel simulated users using the UI