Closed robertpanzer closed 4 years ago
Just retested with 1.1.0-rc.0 and it seems like this is fixed.
Reopening again, I didn't realize that after the installation of 1.1.0-rc.0 pilot didn't start due to the resource requests. Memory consumption seems to rise slower now, but I easily got to >100MB, and that with a single ServiceEntry!
What even worse is, if keep deleting/adding serviceentry, ultimately the envoy process within the istio-proxy will be restarted.
This can be verified by running the following script:
for i in $(seq 1 100); do echo "round $i: add servcieentry"; kubectl apply -f se_test.yml; sleep 10; echo "delete servcieentry";kubectl -n dep-se-test delete serviceentries.networking.istio.io wmt-subnets; kubectl -n dep-se-test exec -it testcurl-85c4c6bd47-xcw97 -c istio-proxy -- ps -e -o pid,ppid,rss,comm --sort -rss; echo ""; done
This happened in 1.10-rc.0
@silentdai could you take a look at this? I'm not sure if it's in the mixer filter or envoy itself. I think it's in envoy.
CC @costinm @PiotrSikora
Could you check and/or upload config dump from the proxy, before and after? I'm wondering if the add/remove dance results only in new entries being added to the proxy, but never removed:
kubectl exec -it pod -c istio-proxy -- curl -s 127.0.0.1:15000/config_dump > /tmp/config_dump
Also, Envoy uses tcmalloc, so maybe some of this memory is freed, but not released back to the operating system:
kubectl exec -it pod -c istio-proxy -- curl -s 127.0.0.1:15000/stats | grep server.memory
Thanks!
@robertpanzer Thank you for the detailed report! I will take a look today. @PiotrSikora Thanks for the comment!
My experiment shows that RSS or server.memory usage was increasing but ceased some how. https://gist.github.com/silentdai/0e887c3c0452a848cc0e788d24bbab73 I don't have theory so far. Will continue investigation tomorrow.
This is what I collected 10 hours after the previous comment. https://gist.github.com/silentdai/97f80b3ae47543b668b2c4ea2e0993f7#file-gistfile1-txt-L35-L36
server.memory_allocated: 7097904
server.memory_heap_size: 22020096
envoy returned the allocated memory to malloc system (18505040 -> 7097904) while the heap size remains the same (22020096 -> 22020096) as tcmalloc doesn't return memory to OS.
So far the conclusion is that envoy doesn't leak at my side. Maybe I didn't fully reproduce.
@robertpanzer how frequent were you applying service entry changes?
@silentdai I deleted and created the ServiceEntry at an interval of maybe 5 seconds and reached >100 MB rss after maybe 10 or 20 creates. This was certainly only to reproduce the bug.
I figured earlier that my services started becoming unresponsive due to the high memory consumption and that was the smallest and fastest reproducer that I could find. What I want to say is that my approach doesn't directly reflect a realistic deployment history, but it occurred at a slower pace on my local dev machine where I only have 2 or 3 services running, so for clusters I would see this to be even worse.
@robertpanzer You mentioned you were using istio1.1 snapshot6. Could you verify it again that we deployed the same envoy version by the below command?
$ kubectl exec -it <PODNAME> -c istio-proxy -- curl -s 127.0.0.1:15000/server_info | grep version
"version": "ea5e8afb79fc1864c244d397192e06513b918f49/1.10.0-dev/Clean/RELEASE/BoringSSL"
I find that envoy doesn't provide heap profling entry while cpu profiling method is there.
With the future patch we do something like below
@PiotrSikora Is it because no one attempted to work on it? Is there substitution heap profiling method? If not I can provide a simple patch tomorrow so that we enable the above flow.
I just retested it with 1.1.0-rc.0 and got this info:
"version": "fc273e117a6ed875c7cb9b3c3251c8d982bc7196/1.10.0-dev/Clean/RELEASE/BoringSSL"
After 21 create and delete cycles the memory consumption of the istio-proxy jumped to >100MB:
/usr/local/bin/pilot-agent 18472
/usr/local/bin/envoy -c /et 106716
Without any further configurations it seems to stay stable there. I am not sure if it considers the memory request of 128Mi.
Good to know envoy is not deteriorating beyond 128Mi in this case. And now we are on the same page: somehow the peak usage is as large as 128Mi.
BTW could you run "curl -L localhost:15000/stats | grep memory"?
We can learn if 106Mi is fully utilized, or reserved by tcmalloc. It's not critical to me at this moment. However, as the investigation goes on, fighting reserved memory and locating how 106Mi memory is fully allocated would diverge IMHO
Dropping to P1 to cover remaining investigation since it looks like 1.1 rc0 does not have a memory leak
I just retried the sequence again after a fresh reinstall of 1.1.0-rc.0 and the problem looks actually worse than I thought. The memory footprint seems to jump very fast again, up to 30MB per re-apply. Plus the istio-proxy gets killed as soon as it hits the requested resources of 128Mi.
2019-02-27T14:43:29.730724Z warn Epoch 0 terminated with an error: signal: killed
2019-02-27T14:43:29.730762Z warn Aborted all epochs
2019-02-27T14:43:29.730796Z info Epoch 0: set retry delay to 400ms, budget to 8
2019-02-27T14:43:30.008196Z info Envoy proxy is NOT ready: failed retrieving Envoy stats: Get http://127.0.0.1:15000/stats?usedonly: dial tcp 127.0.0.1:15000: connect: connection
refused
2019-02-27T14:43:30.131888Z info Reconciling retry (budget 8)
2019-02-27T14:43:30.132095Z info Epoch 0 starting
2019-02-27T14:43:30.141430Z info Envoy command: [-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster testcurl.sr-
The curl command gave me this:
server.memory_allocated: 79505608
server.memory_heap_size: 83886080
Also checked that the version of the istio-proxy is correct, it still says:
"version": "fc273e117a6ed875c7cb9b3c3251c8d982bc7196/1.10.0-dev/Clean/RELEASE/BoringSSL",
In my last test I managed to bring the memory size to 90 MB after about 4 or 5 create/delete cycles.
@robertpanzer but is this a leak where memory grows without bound or does it stabilize? If the later, are you saying that memory usage is too high?
In my tests today the memory usage didn’t exceed 128MB, but it didn’t really stabilize. Instead the envoy process got killed every few create/delete cycles of my service entry. I wonder what that means for accepting and handling requests? I think this is not a hot restart that could potentially continue in flight requests, is it?
@silentdai @PiotrSikora I think we need to set this back to P0. We need to understand the relationship between what @robertpanzer is trying to do and Envoy's memory usage. It's possible that we need to recommend an increased memory limit, but there should be a good explanation for why this is necessary.
It's only 1 step away from merging this PR in the upstream envoy and back port to istio/envoy. Once we have that it's should be easy to determine which path eats up the memory
@silentdai very cool work on adding the heap profiler to envoy.
@robertpanzer my guess is we'll end up resolving this in an early 1.1.x release. We probably will not pick up a new version of Envoy until immediately after 1.1 ships.
EDIT: a service point is not always mapping to a listener. It could map to a route. The progress:
@silentdai I don't think that this nuance is important. I just think that it is not good the Istio-proxy gets killed on config updates. There were no requests processed by the proxy at all in my tests, but if I am not wrong I guess that in flight requests would get lost as this doesn't seem to be a hot restart. Having said that, it's probably nicer that the process gets restarted because it could be a chance for doing a hot restart, but afaict from the logs these look differently.
Should be fixed by https://github.com/envoyproxy/envoy/issues/6252 in istio 1.1.0 rc4
Turning the endpoint on and off is supposed to cause no listener flipping. However, before the fix the unstable marshaling is causing envoy proxy to wrongly determine as config change in listener and expensive listener draining.
The drain would last minutes. When the listener config is pushing frequently, we can expect many draining listener are hanging.
The 4 factors each is linear to the heap usage
So if you are flipping 5 times in a draining window, I would expect 5 2 1 MiB * 13 = 130 MiB heap spike. Also CPU is high because listener spawn is cpu-intensive.
@robertpanzer Please use istio 1.1 rc4 to retest. My understanding is that the core fix is in istio-pilot although you encountered issue at envoy. However, I highly suggest you upgrade the whole istio system.
1.1rc4 will be created March 12th
I can still reproduce the restarts of the istio-proxy as long as I delete/create within the drain time window, which is indeed 45 seconds. If I wait for more than 1 minute between delete and the next create memory seems to be reused.
What indeed got better with rc.4 is that if I make small updates to the ServiceEntry, only the changed listeners seem to get drained and recreated.
Your described behavior is expected. As in rc4 the affected listeners are less than before, the memory spike is should be highly reduced. Is rc4 acceptable from your perspective?
@silentdai Can you explain though why the Envoy process restarts instead of the pod? In other words, what sort of xDS updates would cause Envoy to restart?
I cannot reproduce the envoy process restart without pod restart. Maybe "envoy process restart" was referring to the fact that pod restart?
My observation is more horrible: If the envoy process died, for example, killed by killall envoy
.
The envoy process no longer restart. The istio-proxy is alive while the pod is not serving any app traffic. By app traffic I mean book-review http request in the demo.
The reason is that pilot-agent is the main process of istio-proxy
@costinm ^^ What is the expected consequence if envoy die silently?
I just tested again, and for me only the Istio-proxy process dies, the container and therefore pilot-agent stays alive. I also checked the pod status and the container restartCount is still 0.
In the logs of istio-proxy I see this when the istio-proxy is killed:
2019-03-15T09:03:56.220651Z warn Epoch 0 terminated with an error: signal: killed
2019-03-15T09:03:56.220715Z warn Aborted all epochs
2019-03-15T09:03:56.220839Z info Epoch 0: set retry delay to 200ms, budget to 9
2019-03-15T09:03:56.421053Z info Reconciling retry (budget 9)
2019-03-15T09:03:56.421200Z info Epoch 0 starting
2019-03-15T09:03:56.422899Z info Envoy command: [-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster testcurl.sr-test --service-node sidecar~10.1.3.155~testcurl-7c7454f99c-l6hhp.sr-test~sr-test.svc.cluster.local --max-obj-name-len 189 --allow-unknown-fields -l warning --concurrency 2]
For me the restarted envoy is serving traffic again. I guess though that inflight requests will get lost.
In my console I found these logs that could give a hint why Envoy is killed: (It's Docker for Mac, so probably not representative for a real cluster implementation, but what is a real cluster implementation anyway?)
default 10:35:16.272078 +0100 com.docker.hyperkit [ 6004.061715] envoy invoked oom-killer: gfp_mask=0x24000c0(GFP_KERNEL), nodemask=0, order=0, oom_score_adj=968
default 10:35:16.274248 +0100 com.docker.hyperkit [ 6004.064176] envoy cpuset=224bf68c6d39ea20a1b315ba740fecde15d5433cb79bb6611e37a73598be7d20 mems_allowed=0
default 10:35:16.275401 +0100 com.docker.hyperkit [ 6004.065933] CPU: 2 PID: 28933 Comm: envoy Not tainted 4.9.125-linuxkit #1
default 10:35:16.276221 +0100 com.docker.hyperkit [ 6004.067075] Hardware name: BHYVE, BIOS 1.00 03/14/2014
default 10:35:16.277467 +0100 com.docker.hyperkit [ 6004.067893] 0000000000000000 ffffffffb1420789 ffffa2048bd73de8 ffff942d049fc240
default 10:35:16.278788 +0100 com.docker.hyperkit [ 6004.069137] ffffffffb11f8d5e 0000000000000000 00000000000003c8 ffffa2048bd73de8
default 10:35:16.280023 +0100 com.docker.hyperkit [ 6004.070466] ffff942d090fa2a0 0000000000000202 ffffffffb17e71ce ffff942d049fc240
default 10:35:16.280434 +0100 com.docker.hyperkit [ 6004.071699] Call Trace:
default 10:35:16.281261 +0100 com.docker.hyperkit [ 6004.072109] [<ffffffffb1420789>] ? dump_stack+0x5a/0x6f
default 10:35:16.282109 +0100 com.docker.hyperkit [ 6004.072933] [<ffffffffb11f8d5e>] ? dump_header+0x78/0x1ed
default 10:35:16.283152 +0100 com.docker.hyperkit [ 6004.073780] [<ffffffffb17e71ce>] ? _raw_spin_unlock_irqrestore+0x16/0x18
default 10:35:16.284064 +0100 com.docker.hyperkit [ 6004.074823] [<ffffffffb11a45b5>] ? oom_kill_process+0x83/0x324
default 10:35:16.284948 +0100 com.docker.hyperkit [ 6004.075735] [<ffffffffb11a4bdc>] ? out_of_memory+0x239/0x267
default 10:35:16.285944 +0100 com.docker.hyperkit [ 6004.076619] [<ffffffffb11f22ac>] ? mem_cgroup_out_of_memory+0x4b/0x79
default 10:35:16.286991 +0100 com.docker.hyperkit [ 6004.077615] [<ffffffffb11f646d>] ? mem_cgroup_oom_synchronize+0x26b/0x294
default 10:35:16.288019 +0100 com.docker.hyperkit [ 6004.078662] [<ffffffffb11f25f4>] ? mem_cgroup_is_descendant+0x48/0x48
default 10:35:16.288993 +0100 com.docker.hyperkit [ 6004.079707] [<ffffffffb11a4c37>] ? pagefault_out_of_memory+0x2d/0x6f
default 10:35:16.289877 +0100 com.docker.hyperkit [ 6004.080665] [<ffffffffb10467af>] ? __do_page_fault+0x3c6/0x45f
default 10:35:16.290654 +0100 com.docker.hyperkit [ 6004.081547] [<ffffffffb17e8808>] ? page_fault+0x28/0x30
default 10:35:16.294938 +0100 com.docker.hyperkit [ 6004.082395] Task in /kubepods/kubepods/burstable/pod6e836375-4700-11e9-8430-025000000001/224bf68c6d39ea20a1b315ba740fecde15d5433cb79bb6611e37a73598be7d20 killed as a result of limit of /kubepods/kubepods/burstable/pod6e836375-4700-11e9-8430-025000000001/224bf68c6d39ea20a1b315ba740fecde15d5433cb79bb6611e37a73598be7d20
default 10:35:16.295813 +0100 com.docker.hyperkit [ 6004.086630] memory: usage 131060kB, limit 131072kB, failcnt 0
default 10:35:16.296751 +0100 com.docker.hyperkit [ 6004.087481] memory+swap: usage 131072kB, limit 131072kB, failcnt 132
default 10:35:16.297682 +0100 com.docker.hyperkit [ 6004.088419] kmem: usage 1220kB, limit 9007199254740988kB, failcnt 0
default 10:35:16.302508 +0100 com.docker.hyperkit [ 6004.089350] Memory cgroup stats for /kubepods/kubepods/burstable/pod6e836375-4700-11e9-8430-025000000001/224bf68c6d39ea20a1b315ba740fecde15d5433cb79bb6611e37a73598be7d20: cache:4780KB rss:125060KB rss_huge:0KB mapped_file:4772KB dirty:0KB writeback:0KB swap:12KB inactive_anon:46272KB active_anon:83564KB inactive_file:0KB active_file:0KB unevictable:0KB
default 10:35:16.303741 +0100 com.docker.hyperkit [ 6004.094175] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
default 10:35:16.305198 +0100 com.docker.hyperkit [ 6004.095539] [27056] 1337 27056 8118 4465 20 5 3 968 pilot-agent
default 10:35:16.306448 +0100 com.docker.hyperkit [ 6004.096867] [28933] 1337 28933 60704 34827 89 3 0 968 envoy
default 10:35:16.307737 +0100 com.docker.hyperkit [ 6004.098117] Memory cgroup out of memory: Kill process 28933 (envoy) score 2010 or sacrifice child
default 10:35:16.309314 +0100 com.docker.hyperkit [ 6004.099494] Killed process 28933 (envoy) total-vm:242816kB, anon-rss:119924kB, file-rss:14612kB, shmem-rss:4772kB
My interpretation is that the operating system simply kills the process because it's cgroup memory limit is exceeded.
And indeed the configuration seems to support this thesis:
istio-proxy@testcurl-7c7454f99c-l6hhp:/$ cat /sys/fs/cgroup/memory/memory.limit_in_bytes
134217728
istio-proxy@testcurl-7c7454f99c-l6hhp:/$ cat /sys/fs/cgroup/memory/memory.oom_control
oom_kill_disable 0
under_oom 0
Pilot Agent is probably not killed because killing envoy is sufficient to free enough memory. The OOM killer seems to be selective:
From https://lwn.net/Articles/761118/
To simplify a bit: when the OOM killer is invoked, it tries to pick the process whose demise will free the most memory while causing the least misery for users of the system.
I can imagine though that other Kubernetes/Container runtime implementations work differently and result in the behavior observed by @silentdai
@robertpanzer Let's use "envoy" to specify the process and "istio-proxy" to specify the container. With your detailed the description I can understand this time :)
So we have understood that it's the oom killer that terminate envoy process. We won't try to reproduce other logic causing envoy died.
I was just wondering how the process was killed, and it looked to me like I was not alone with this :-D To find out what happens I still did the repeating create and delete.
I agree that this should not happen too often, and it's not blocking me anymore.
Actionable here and close this issue:
Sorry for hijacking this issue but we actually use dynamic service entry creation / deletion in production and have been bitten by this:
We could drop this check or remove the service doing the check from the service mesh but going forward we planned on using service entries heavily by providing APIs for whitelisting outbound traffic from these instances. Is this something we should plan to implement otherwise?
@jschlichtholz Sorry for the late reply. It highly depends on how frequent your collection service endpoints change. However, I would always suggest providing a pool that member never change(or changes less infrequently) and do not add/remove member upon health check response. See the envoy philosophy here
🚧 This issue or pull request has been closed due to not having had activity in the last 105 days. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.
Created by the issue and PR lifecycle manager.
Describe the bug I have a minimal setup of Istio 1.1.0-snapshot.6 with a single Deployment. When I repeatedly create and delete the same ServiceEntry the memory consumption of the Istio-proxy next to my service only goes up to > 200MB.
Expected behavior When ServiceEntries are deleted the memory that was claimed before should be freed again.
Steps to reproduce the bug Install Istio 1.1.0-snapshot.6 and install the following deployment:
Now the memory consumption of the Istio-proxy shows this:
Now create a file with this ServiceEntry:
(I admit that this might not be the smallest ServiceEntry, but I experience the same behavior at a smaller scale with smaller, more conventional, ServiceEntries.)
Now repeatedly install and remove this ServiceEntry. After some installs the memory consumption of the Istio-Proxy jumps by ~25MB and this is not returned anymore. I don't send any requests, so the proxy basically has nothing to do from my perspective.
While removing and adding the ServiceEntry I see these memory consumptions:
Even after ~20 minutes the memory consumption is the same.
Version Kubernetes version 1.10.11 on Docker for Mac. Istio 1.1.0-snapshot.6
Installation helm install ~/istio/istio-1.1.0-snapshot.6/install/kubernetes/helm/istio-init --name istio-init --namespace istio-system -f ~/istio/istio-1.1.0-snapshot.6/install/kubernetes/helm/istio/values-istio-auth-mcp.yaml --set global.mtls.enabled=true --set global.tracing.enabled=true --set pilot.traceSampling=100.0 --set tracing.enabled=true --set grafana.enabled=true --set global.useMCP=false
helm install ~/istio/istio-1.1.0-snapshot.6/install/kubernetes/helm/istio --name istio --namespace istio-system -f ~/istio/istio-1.1.0-snapshot.6/install/kubernetes/helm/istio/values-istio-auth-mcp.yaml --set global.mtls.enabled=true --set global.tracing.enabled=true --set pilot.traceSampling=100.0 --set tracing.enabled=true --set grafana.enabled=true --set global.useMCP=false
Environment Running locally on Docker for Mac
Cluster state The script for the cluster state fails on the docker-compose container, so I can't create whatever is expected. However here is what I could get manually:
clusterinfo.zip
Also the logs of galley, pilot and the proxy:
logs.zip