Closed jkowalski closed 1 year ago
Summarizing discussions on the design.
The test infrastructure has three main components: test framework, storage and visualization, and the system under test. The test framework provides necessary modules for writing load and performance tests and is also responsible for generating test traffic. The storage and visualization component stores test results and provides graphs for single test runs, comparison of multiple runs, and trends over time. The system under test is determined by scenarios and configurations to be tested.
For better separation of the system under test from the framework, the test framework along with the storage and visualization components will be hosted on a separate cluster which will be running continuously. Load and performance tests will run against a test cluster which is created according to user-defined configurations. For each test run a new cluster is created and later destroyed. The separation of the system under test from the framework allows community-provided cluster configurations and tests to run against them.
Single-run tests can be executed on a custom cluster which is destroyed after the run. For continuous runs, we will use Cloud Build to run different test scenarios. Test results will be in the format that is appropriate to the storage and visualization components and will be stored in a GCS bucket.
Some initial thoughts questions:
But otherwise, I think this all makes sense.
@Kuqd -- I expect you have opinions. WDYT?
I would definitively create a new cluster for each run. Cloud Scheduler is a smart idea we could definitively leverage that.
Now I don't think the e2e is enough to run the performance test, it's a good start but I think we want a test that apply load, probably using fortio or locust to create a lot of allocations concurrently.
I definitively need to investigate more on fortio. But from what I can read from @jkowalski and @pm7h we're on the right track.
Lastly how big that cluster must be ? We should aim to reach 250 allocations/sec at least.
@Kuqd We were discussing load generation today. From what we understand Fortio does not provide a load generation framework, but we can simulate that by creating multiple parallel Goroutines in the test. This is basically what Locust does using Python's gevent.
In summary, Locust provides a better load test framework since it generates load and is light-weight. It also allows you to write code for your tests. However, the available visualizations are not great since everything I have seen is time-series based.
I'm interested to know why you think e2e is not enough for performance/load tests. What do you see as the downsides if we generate a large number of Goroutines and use that for load generation? This way we could take benefit of Fortio's graphing capabilities.
Yes you're right, and I think our test plan is not that complicated. With go we will be definitively more flexible.
Should we write the load test using the same e2e framework ? Do you think it's interesting if it's a function of our e2e test suite ?
We could definitively run that 4 times day in the current e2e GKE cluster and replace the github PR CI with e2e on kind (docker).
https://github.com/GoogleCloudPlatform/agones/blob/master/build/Makefile#L226
Should we use that target ?
This target is good for stress testing fleet scaling but for testing allocations I would add another argument that says how many concurrent calls we should have. The test would then start a separate Goroutine for each. This is basically simulating what Locust is doing with Python's gevent. Does that make sense?
@ilkercelikyilmaz - did you already write something like the above for #536 ?
If #536 doesn't have a load parameter, I can write something similar to what we have for fleet_test and the target mentioned in https://github.com/GoogleCloudPlatform/agones/issues/573#issuecomment-464948872. And then we can emit metrics in Fortio format.
I implemented something using the e2e framework for my own testing.
@pm7h I can show you what I implemented tomorrow and you can decide how you want to implement the load test for allocation.
Spent some time gathering what we have for completing this task:
1) make stress-test-e2e
- create a Fortio formatted JSON file with results (different percentiles and QPS).
2) Also there is @ilkercelikyilmaz continuous hours running test, could be found here which helps to find memory, goroutines leaks and performance over time of operation.
For stress-test-e2e
results were uploaded into GC bucket and we can see them in a fortio:
fortio server -sync https://storage.googleapis.com/fortio-sync-2
There we can compare results of different versions. So what we don't have is a one script to:
There is an option to run this stress-test-e2e in a Prow Job, an example on how Istio use Fortio in a Prow Job could be found here: https://github.com/istio/istio/wiki/Working-with-Prow https://prow.istio.io/?job=daily-performance-benchmark https://prow.istio.io/view/gcs/istio-prow/logs/daily-performance-benchmark/151 In a full log there could be seen that fortio is set in these 14 hours long Benchmark tests.
+ setup_fortio_and_prometheus
+ setup_metrics
++ kubectl get services -n twopods-istio fortioclient -o 'jsonpath={.status.loadBalancer.ingress[0].ip}'
https://github.com/istio/tools/blob/master/perf/benchmark/run_benchmark_job.sh
There was an opinion, that we should use https://mako.dev/ for performance testing. Here is one of the examples on how to use it: https://github.com/knative/serving/commit/a0a32a7895445f9c71b0be9a8c2c0d1b52d75c99
Made a request to create benchmark for the project: https://github.com/google/mako/issues/9
Since this hasn't been updated in over 2 years I've marked it as stale.
Let's close this, and we can restart it if necessary - possibly with different profiling tools.
We need a solution for gathering and publishing/analyzing Agones performance over time to identify trends and performance regressions.
We need to solve 4 aspects of it:
Here's one proposal:
Use https://fortio.org/ for visualization (used by Istio, stores performance metrics in very simple JSON which we could quite easily produce ourselves)
Example visualizations: _Single run, Comparison of two runs and trends over time._ Updated links: Single run, Comparison of two runs and Trends over time
Instrument e2e tests (akin to #571) to emit Fortio-compatible metrics as JSON files, and store those in GCS bucket for long-term storage. We could have "official" flag for tests that run against clean cluster
Have a website (https://performance.agones.dev/ on appengine perhaps?) that hosts Fortio and periodically syncs data from GCS bucket for presentation purposes.
Have some cron job, that starts a clean GKE cluster on a schedule (e.g. 4 times a day), launches appropriate tests and uploads results to GCS for long-term storage and pushes them to Fortio for presentation. Hoping we can have a Makefile-based solution that we could drive using Cloud Build or something like that.