Closed kyessenov closed 6 years ago
What CPU and memory requests/limits did you use in testing? I used
spec:
containers:
- name: manager
image: {{.hub}}/runtime:{{.tag}}
imagePullPolicy: Always
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
With 50 rules:
Exec'ing into the discovery container, from top -c
I saw about 22-25% CPU from the envoy process. Using minikube dashboard with heapster, discovery was using .24 cores.
With 100 rules:
top -c
: 35-40%
dashboard: .47 cores
Not sure if there have been some optimizations since the issue was originally created, but I figure we should document the scale/performace setup and test cases so we can keep track of these figures periodically going forward
Can you confirm that the rules are used by checking the envoy configs (or RDS output)? Just want to make sure that rules are not filtered. My setup may have been bad since I realized that k8s clusters can go into a baseline state with high CPU usage. For the perf testing we should try to create fresh clusters to deal with this problem.
Hmm, not sure. The rules I created with istioctl are along the lines of:
type: route-rule
name: productpage-default1
spec:
destination: productpage.default.svc.cluster.local
precedence: 1
route:
- tags:
version: v1
---
type: route-rule
name: reviews-default2
spec:
destination: reviews.default.svc.cluster.local
precedence: 1
route:
- tags:
version: v1
repeated over and over with different names. The output doesn't change when I call the RDS api, but I can watch the CPU usage drop or spike as I create or delete. So a majority of the rules are jsut repeats, but RDS api still needs to process them all
In regards to your create fresh clusters
comment, since I am using minikube, are you suggesting doing minikube delete && minikube start
prior to each test? Or just to insure there are no items in other namespaces in use during performance testing?
Unless I'm missing something, (very) roughly speaking the number of RDS/CDS/SDS queries from envoy grows exponentially with number of services and pods, e.g. M pods serving N services where M > N roughly translates to M envoy instances querying for N services via RDS/CDS/SDS O(MN). Adding more rules further increases the cost.
Having specific perf targets here would be useful (see https://github.com/istio/manager/issues/309), but we probably need some shorter term solution to carry us over since https://github.com/istio/manager/issues/309 is marked BETA and we've already observed failed RDS/CDS/SDS queries in the current integration test. Possibly some combination of (1), (2), and (4) below would work.
General solutions to the problem:
There is also work in Envoy to use GRPC for DS. Then we can do something similar to (6).
Thanks for the reminder - ref gRPC streaming APIs for SDS/CDS/RDS/LDS/load reporting
Also, (7) move discovery service to per-pod (or per-node) to distribute SD load
@GregHanson I was using different "header match" prefix conditions to make sure that all rules end up in the final config. But @ayj is right about the real issue here, which is the total computation cost O(# services # pods #route rules) and the load is not spread between RDS pods.
@kyessenov what do you imagine for the test environment, Google Cloud? Is the plan to use Heapster for metric recording during tests?
If we want to measure perf of the components, a single node minikube should be OK. If we want to see how real world apps behave, then a multi-node cluster is preferable. If we can do OK with a single node cluster for big apps, then that is sufficient and it's unnecessary to test on a multi-node setup. We'll need to measure several dimensions, not all provided by heapster:
We can also stress test an app by increasing the network load, to find the limits and see how things start to fall apart.
In Amalgam8, we used a similar setup to here: https://www.simonholywell.com/post/2015/06/parallel-benchmark-many-urls-with-apachebench/ in our request-handling benchmarks - perhaps a similar setup can be used?
Current plan for alpha is some combination of (1), (2), and (3). (1) and (2) should be configurable via manager flags (and eventually ConfigMap) and k8s replicas. I'm working on a simple version of (3) which performs in-process cache of most recent discovery responses. Out of process cache (number #4 above) could also be explored if basic alpha targets can't be met, but might require plumbing through certs for mTLS between proxy and manager (edit: or adding sidecar proxy in front of discovery service).
7) Or is it the same as 6?: The root cause of this issue seems to be that envoy polls istio-manager every 1s. Increasing the value will offer just temporary relief, same as caching. Can this be fixed by re-architecting, such as having the istio-manager push the config changes to the istio-proxy, and envoy poll istio-proxy (if polling is the only way)?
2) Horizontal scaling: this is something we definitely need to do, sooner or later, as long as a single instance can support a decent number of apps (waiting for the numbers). It is not correct to add more replicas or to force an istio installation per namespace to hide scaling issues.
Yes, root cause is pull vs. push. The gRPC streaming API addresses that fundamental problem but isn't ready yet. The current assumption is that some combination of (1), (2), (3) will buy us sufficient time to avoid having to implement a more complex workaround and then reimplement it yet again against gRPC. Hopefully the proposed benchmarks for alpha will validate whether this assumption is true or not. And if it isn't, we have better justification for re-architecting configuration distribution as you suggested.
I'm curious why you say it is not correct to add more replicas? The DNS addon with kube-dns-autoscaler seems to work that way.
I was saying it is correct to add replicas to scale, this is a fundamental way of scaling, but it is not correct to add replicas to hide bugs or a flawed design. For instance, we would not want to add an istio-manager instance for each app deployed, or for each namespace, as it has been suggested.
I don't believe anyone suggested having istio-manager per-app or per-namespace in this thread, but maybe I'm missing something? The current plan is specifically to keep discovery centralized with some stop-gap optimizations until push-based gRPC API is available (i.e. a better architecture option).
istio-manager discovery performance is sufficient for e2e integration and smoke tests. Moving to backlog for now until we complete benchmarking for alpha.
I spent a bit of time load testing SDS. CDS/RDS peek CPU% will likely be higher due to more complex request processing, but steady-state CPU% should be similar due to caching.
What is 500m ? Btw, thanks for benchmarking this!! Would you be able to try the following setup: 10s for SDS, 60s for CDs and RDS? On Tue, May 2, 2017 at 11:49 PM Jason Young notifications@github.com wrote:
I spent a bit of time load testing SDS. CDS/RDS peek CPU% will likely be higher due to more complex request processing, but steady-state CPU% should be similar due to caching. Setup
- GKE 50 node cluster with machine type n1-standard_2
- 25 services/deployments with single port (HTTP)
- Increased discovery refresh delay from to 5 seconds (1-to-10s effective polling frequency)
- Increase manager CPU request to 500m
Benchmark
- Scale deployment replicas from 0 to 20 (0-5 pods per core, 0-500 pods total.
- Observe CPU usage both while deployment is scaling up/down and in stead state
- Observe SDS failures via envoy stats.
Observations
- Kubernetes would not schedule 500 pods with default 100m container resource limit. Reducing default limit to >50m helped. This should be okay for this particular benchmark since individual proxy container load is minimal (<20m) and we aren't benchmarking QPS/Latency.
- Steady state CPU% was <500m CPU with 500 pods.
- CPU% temporarily spiked to >1500m CPU when all deployments are concurrently scaled from 0 to 20 replicas each. This might not be an issue in practice unless operators frequently deploy 500+ pods all at once and expect immediatly service availability.
- No SDS failures observed when scaling from 0-to-500 pods.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/istio/manager/issues/292#issuecomment-298817652, or mute the thread https://github.com/notifications/unsubscribe-auth/AH0qd1ZXmsiqC7bVdOzutxv7F59mKoM-ks5r1_k9gaJpZM4MZDfD .
--
~shriram
What is 500m?
50% of a CPU core using Kubernetes resource terminology (see Meaning of CPU).
Would you be able to try the following setup: 10s for SDS, 60s for CDs and RDS?
Sure. 60s would be up to 120s of propagation time. We'd need to test 30s if we want max 60s.
Related to #1237
We will have to tackle this seriously in 0.3, changed it to a P0.
subscribe
Issue moved to istio/istio #1485 via ZenHub
with about 10 rules, it's ~50% CPU; at 30 rules, proxies start dropping requests.