Performance Tests - Githubissues

Harwayne commented 5 years ago

Problem We don't know performance numbers for Eventing objects.

Persona: Event Consumer and Contributer

Exit Criteria A set of tests that repeatably produce performance metrics.

Time Estimate (optional):

Additional context (optional) The metrics I think are interesting (all of which will be unique per ClusterChannelProvisioner):

[ ] Latency between sending an event into a Broker and receiving that event via a Trigger.
[ ] Number of events per second can be sent through a single Broker reliably with no more than X delay.
[ ] Number of events per second can be sent through a single Channel reliably with no more than X delay.

Repeatability We will have to be careful to make these measurements as repeatable as possible. E.g. always run on GKE with n1-standard-4 machines.

bbrowning commented 5 years ago

I'm assuming this would only deal with performance of actual Eventing components and not things like Eventing Sources. So, the tests will directly push events into Broker/Channel instead of those events originating from a source in the eventing-sources repo.

Harwayne commented 5 years ago

@bbrowning Yes, my intention was that these tests would be around the performance of the core eventing pieces (Channels, Subscriptions, Brokers, and Triggers). I think using something to artificially create and push events, instead of a real event source, will make the tests more repeatable, and therefore more useful.

We should have similar tests for sources, but those may be much more specific. For example, how do I load test the GitHubSource? Does it involve effectively spamming GitHub itself with PRs, issues, comments, etc.? Each source will likely have different ways to generate load, so need unique things to be tested.

n3wscott commented 5 years ago

I would like to work on this one, would be happy if anyone wanted to join me as well.

/assign

Harwayne commented 5 years ago

/assign

I would also like to help. I have some ideas on how to implement "Latency between sending an event into a Broker and receiving that event via a Trigger."

n3wscott commented 5 years ago

@Harwayne check out https://github.com/n3wscott/knperf

Way more work to do and there is another idea on a second container that sets up the env before the test.

akashrv commented 5 years ago

https://github.com/knative/eventing/issues/1096

n3wscott commented 5 years ago

/milestone 0.7.0

knative-prow-robot commented 5 years ago

@n3wscott: You must be a member of the knative/knative-milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your and have them propose you as an additional delegate for this responsibility.

In response to [this](https://github.com/knative/eventing/issues/939#issuecomment-491437083): >/milestone 0.7.0 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

akashrv commented 5 years ago

/milestone v0.7.0

n3wscott commented 5 years ago

/unassign

Harwayne commented 5 years ago

/unassign

I don't have the bandwidth to look at this now.

chizhg commented 5 years ago

In 0.7.0, we decide to only measure the latency between sending an event into a Broker and receiving that event via a Trigger. Since Knative Serving has done a lot of work for performance testing, it would be nice if we can leverage their work as much as possible. However, as Eventing is not simple send and wait for reply model, we'll need to derive a new way to measure the latencies. But after we get the metrics data, following steps like saving and visualizing it can be the same.

Based on what was described above, a feasible workflow would be: Write a performance test case, which does:

Create the topology, in this case one Broker and one Trigger, and get them ready. We can test the performance of different implementations by changing the Channel used by the Broker.
Create a Pod to send events to the Broker and also receive events from the Trigger. This can be done by starting a receiver in the Pod and set the Pod Service to Subscriber of the Trigger. Since both sending and receiving are done in one single Pod, we can simply use a Map to keep track of the timestamps and calculate the latencies.
After the test is done, export the test result, and save them as the format that can be parsed by Testgrid.

To run the test case repeatedly, we can create a periodic job in Prow and create a corresponding test group in Testgrid. In this way we can also check the historical data in one single place of Testgrid.

Though this solution is workable, there are a lot of things we need to improve in the future:

We do not have a nice loadgenerator as in Serving, which is built on fortio. For now we can produce the load by simply creating multiple go routines, but in the future we'll need a more elegant way to generate the loads. (Possibly improve fortio by supporting customized callback.)
As described above, we need to export test result from the Pod to the node that runs the tests. One way to do this is saving the result in the Pod log and parse the log from the node. But it is hacky and inflexible. Potential improvements can be using PersistenceVolume or saving the test result into a database.
Testgrid does not have good visualization support. To help us better understand and analyze the historical data, we need better visualization tools. Serving has plans on this and we can again use the same approach after it's determined.

chizhg commented 5 years ago

/reopen

knative-prow-robot commented 5 years ago

@Fredy-Z: Reopened this issue.

In response to [this](https://github.com/knative/eventing/issues/939#issuecomment-504106292): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

grantr commented 5 years ago

We have a latency test for Broker. #1461 tracks throughput tests.

knative / eventing

Performance Tests #939