[Proposal] Eventing Perf Testing

n3wscott commented 5 years ago

Objective

Knative Eventing needs a reproducible way to perform performance testing. We would run these tests after releases and nightly builds to detect performance changes. The set of tests should be something anyone can run that has a kubernetes cluster. Ideally a solution could be used by any Knative project.

Non-goal: perf testing in minikube.

Non-goal: Test Eventing Sources (though not block doing this in the future).

Background

At the moment we have e2e tests that are fragile to run and understand their results. These e2e tests are produced by a combination of bash cluster setup and then a golang test script that is in-charge of resource creation, waiting, test running, and clean-up. The debug method assumes the consumer of the e2e test event will output the expected result to the pod logs and we are doing a simple grep off those logs. Debugging these tests is challenging.

Serving currently has some performance tests that are script based and produce load from a load generator and then traffic is directed at another cluster. This works well for serving because the usage model for serving is straightforward when compared to eventing. Serving will most likely be invoked from an external entity and thus measuring perf through the various ingress methods works. Eventing on the other hand will only really care about cluster-local traffic, most traffic that goes onto the eventing mesh will be bridged via a source app and then forward on an event to a Channel, Broker or Service.

Eventing provides challenges in performance test setup from the topology required for each test. For performance tests, perhaps there is a more developer friendly way to write and have these tests run; and a more test friendly way to retain test result history.

Requirements and Scale

Perf tests need to be reproducible.
- Tests would be dependent on the usage, size and configuration of the cluster as well as the location that the tests are running.
Perf tests should scale to also be stress tests.
- We do not know what upper limits we can tell customers for eventing.
Perf tests should work in a developer's workflow.
- We can start to work on performance and get a tight feedback loop for pre-released code.
Historical Perf data needs to be saved.
- We want to be able to say "release n+1 increases QPS by X%!". We need historical data.
- Serving is using mako and it looks nice.
Perf tests should be generalized to e2e tests.
- We could leverage the framework to add better e2e tests. The difference between a perf test and an e2e test is minor. Perhaps the duration of the test, and the data collected.
One test at a time for performance.
- The custer needs to have a known load to get a perf number that is relievent.
Run with Known Config.
- Each test run needs to be able to gather metrics results, logs, and traces from the cluster using existing observability tools. This is distinct from the measurement the test might do.
- Gather the version of k8s, knative, and any other dep.

Design Ideas

In looking at this problem space, there is a tool that is really good at creating replicable results in a cluster: Kubernetes. We should create a custom controller for Kubernetes designed to run perf tests. Then create a custom resource definition that holds the generic parameters we know we need. In this way we can just post the list of tests to a cluster, the controller will do what is needed for setup, the test, and teardown.

The perf CR will run as a job. The controller can observe the running jobs and allow only one to be reconciled and run at a time.

The CRD spec will contain the following:

Template for Test image.
- This is the test that will be run.
- It might make sense for this to be a deployment template so the test can specify a scale to use.
Test setup/teardown image.
- This will be a job that is run in one of two modes: setup or teardown. The role of this image will be to reconcile the cluster readiness for a testing scenario or remove artifacts.
Other test parameters we find we require. Perhaps:
- Test timeout.
- Result destination?
- Test Pod pair - a helper pod that uploads results to a dashboard?

These two images are separated to allow for the test to be decoupled from the setup. For example: a loopback test needs to only care that the events it sends will get delivered back to the running container. The same test could be used for various topologies: a single channel, many channels, broker, broker + channel, etc...

Once a Perf Job is run, then the status will be marked as complete with the result of the test written to the status. Testing artifacts will be uploaded to the various buckets and dashboards we require. The test could be written to write saveable results to a file and we could pair these with a pod that uploads our results.

To allow serving or build to also use this tool, an option could be to have two clusters, one with this controller installed, and another being targeted for the test.

Alternatives Considered

We could continue writing bash+golang scripts. The go script could be pushed into a single pod and run, then watched from the outside of the cluster. We could continue to grep the logs for known keys, one key could be perf results. I lean away from this because writing controllers is the tool that I use in knative, and it is interesting to me to be able to use that same tool to write tests.

Write pure go tests which interact with the cluster.

Use a tool like Sonobuoy to run the tests.

I researched load generators for eventing and was not able to find anything that would do what we want.

Related Work

knperf: I wrote a very quick POC that uses a controller to create a job and then watch the job until it is done. It does not have the two image idea. It does not upload results. It does not parse the logs.

https://github.com/GoogleCloudPlatform/distributed-load-testing-using-kubernetes (https://medium.com/google-cloud/google-kubernetes-engine-load-testing-and-auto-scaling-with-locust-ceefc088c5b3)

https://github.com/heptio/sonobuoy - looks like you can make your own plugin. But this is another form of re-inventing CRDs.

n3wscott commented 5 years ago

/kind proposal

chizhg commented 5 years ago

It would be great if we can have a common framework for all the testing (e2e, conformance, perf, stress), but to me there are some other things we need to consider:

The current bash script is used by Prow to set up the test environment. In the script, we use ko apply -f config/... to install all the dependencies. This is also the standard way for the user to install a new provisioner or eventing-source, as mentioned in the doc like README for GCP PubSub Channels. Having them in our bash script is important since it can guarantee the installation step is not broken.
The e2e tests should be run in parallel, as opposed to only one run at a time for perf tests.
As mentioned in this proposal, we need to collect much more data for perf tests than e2e tests (which only need pass or fail), and we also need to manually save them at somewhere else. I'm not sure if it would be worth to have both logic in one single framework.

And of course there are some steps shared by all testing types, for example:

All tests need to have topology setup before sending actual events, e.g. one channel, multiple channels, broker+channel.
We do need a way to better validate the events received by the endpoint service. Maybe even for performance testing? Such as we want to verify every message is intact and not broken even if the load is high.

So, will it be possible or better to decouple the testing framework into different component, and we can use whatever we need based on the actual demand? For example, we can have a TopologySetupController and it can reconcile and create all the channels, brokers, services we provide. And we can also have ValidationService to receive both actual and expected results and do validation for us.

And it seems this proposal is also related to the Serving perf testing that @srinivashegde86 is working on. Please comment below if you have any thoughts on this.

akashrv commented 5 years ago

Few question that can be resolved orthogonal to the test infra discussion.

Where will the results be saved? and how to look at the historical tend?
Could you add some more specifics about the perf test itself?
- What results will be measured? Is it just E2E event latency or even breakdown of latency as the event moves through the eventing components.
- What topology (or multiple topologies) of broker/channel/subs will be used for these tests? If multiple?
- Ideally, each eventing component should emit the metrics that would be captured by the perf runs such that tomorrow same metrics can be observed in a customer deployment in production. Having these details will drive requirements for https://github.com/knative/eventing/issues/904

akashrv commented 5 years ago

/milestone v0.7.0

akashrv commented 5 years ago

related to (pre-req) #939

akashrv commented 5 years ago

/close

knative-prow-robot commented 5 years ago

@akashrv: Closing this issue.

In response to [this](https://github.com/knative/eventing/issues/1096#issuecomment-504131356): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

knative / eventing