Open Cryptophobia opened 6 years ago
From @arschles on June 9, 2016 22:48
From @jchauncey on June 10, 2016 16:20
so as it stands right now i can push a significant amount of requests through deis and not see any real degradation in performance. That being said we need to do a few other things besides just sending a lot of requests to router and ultimately to a simple go app.
My thoughts on this are still kind of cloudy but here is what I had in mind:
Have telegraf send all metrics for e2e runs to a hosted influx system where we can collect long term meaningful metrics. This will allow us to spot trends and new problems more efficiently.
Use the regular e2e runs to make sure we are within certain bounds performance wise. We should eventually hook up kapacitor scripts to alert us when an e2e run is outside of those params.
Setup a nightly job that runs on a normal size cluster (5 or so nodes) and it deploys apps which can simulate failures (returns non-200 response code), generate arbitrarily large response bodies, and maybe makes calls to other dependent services. We would then use the cli to arbitrarily scale those apps up and down while also doing simultaneous deploys and generating traffic. This would allow us to see how the system performs while apps are under load and the operator is using the system to respond.
My main concern is that during a high load event the controller can still receive requests to scale up/down to meet demand.
From @bacongobbler on June 9, 2016 22:43
cross-post of https://github.com/deis/deis/issues/4037
Copied from original issue: deis/jenkins-jobs#100