Open soar opened 6 years ago
I have this issue as well. vpnkit.exe claims memory in proportion to the amount of network activity into and out of the docker containers, and never releases it.
Edit: the latest version 18.03.0-ce-win58 (16761) may have resolved the issue. Fingers crossed.
I've updated my Docker to 18.03.0-ce-win58 (16761)
two hours ago - and problem is still here:
1 day later it's sitting at 1.2G, from light traffic.
We are several here at work having this exact issue.
Thanks for your reports.
In order to make progress with this issue I need some reproduction steps. Could you provide a docker-compose.yml
(or similar) and instructions to reproduce the problem?
After trying a bit, it seems opening an HTTPS connection to a server on our internal network triggers the bug. The same doesn't apply to external, public servers (i.e. docker.com) nor other Docker instances.
@laarmen thanks for the update. Could you trigger the bug and then upload a diagnostic report? I'd like to take a look at the logs.
See https://github.com/laarmen/VpnKitPoC for the code. How can I do the diagnostic report thing?
On Windows there should be a whale-shaped icon in the system tray. After right clicking on it there should be a menu item called something like "Diagnose and Feedback". Clicking on this should take you to a dialog where diagnostics are uploaded and assigned a unique id. If you quote the id in the ticket then I can download the logs and take a look.
(Sorry I couldn't give more precise instructions but I don't have a Windows machine to hand)
On Fri, Mar 30, 2018 at 2:19 PM, Simon Chopin notifications@github.com wrote:
See https://github.com/laarmen/VpnKitPoC for the code. How can I do the diagnostic report thing?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/moby/vpnkit/issues/371#issuecomment-377522486, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMHul2HIQrFUI_nFNH_aLxIOT-jhgCTks5tjjDlgaJpZM4S6LDK .
I'm not entirely sure this was an instance of the bug as I was still under the 300MB bar of RAM used by vpnkit, but it was consistently climbing. I'll upload another report if I get to the "eat-my-RAM" levels later.
ID: 5E3BFA7A-FF8F-4077-8583-773FF79518CC/2018-03-30_18-33-45
In case that's useful, I just stopped all the docker containers on my workstation, waited a few minutes, and the vpnkit process sits at 700MB. I uploaded a second diagnosis, see 5E3BFA7A-FF8F-4077-8583-773FF79518CC/2018-03-30_19-09-21
This time on my home computer and network, same code except that the target (on local network) is using plain http (no SSL), the memory grew to 1.5GB.
ID: D25DA2F3-2F67-42BA-A292-78A39BCBAEC4/2018-03-30_20-34-58
We are having same issue. This is happening to us in under one day (although we are using an app that generates a lot of network traffic). So we currently have to bounce docker once per day.
Same thing for me - I'm running a single Node.js process that downloads files from the web over HTTP (text and binary) - some 25-30K files, ~1GB in volume, about 100KB/s. VPNKit process consumes all available RAM within hours (I've had it consume up to 9GB of RAM, even though the overall limit for Docker itself is 2GB).
Same here. Win 10 x64, docker version:
Client:
Version: 18.03.0-ce
API version: 1.37
Go version: go1.9.4
Git commit: 0520e24
Built: Wed Mar 21 23:06:28 2018
OS/Arch: windows/amd64
Experimental: false
Orchestrator: swarm
Server:
Engine:
Version: 18.03.0-ce
API version: 1.37 (minimum version 1.24)
Go version: go1.9.4
Git commit: 0520e24
Built: Wed Mar 21 23:21:06 2018
OS/Arch: windows/amd64
Experimental: true
vpnkit.exe currently at 7GB, constantly climbing:
0 containers running. Uploaded a diagnostic, id: 29542F91-6441-4210-934F-DB948F4EF0EF/2018-04-04_16-34-22
UPDATE 24/04/2018: As suggested below, adding vpnkit commit sha:
bazzilic@CSLRF21 $ & "C:\Program Files\Docker\Docker\resources\vpnkit.exe" --debug --ethernet foo
vpnkit.exe: [INFO] Setting handler to ignore all SIGPIPE signals
vpnkit.exe: [INFO] Version is 7c425f691978cb4a708ccc295dd331eae5cebc85
Here's a simple docker-compose.yaml
that can reproduce the issue. If you watch memory usage on vpnkit.exe
when this is running, it climbs by almost 1M every time the wget
runs.
version: '3'
services:
eat-memory:
image: busybox
entrypoint: sh
command:
- -c
- |
while true; do
echo Getting docker.com...
wget -qO/dev/null https://www.docker.com
sleep 5
done
In my case it climbs even if there’s no activity related to docker at all. At least, nothing explicit.
Even with all windows and linux containers stopped, the memory usage is constantly around 1.5 GB on my machine...
I have this same issue occurring with 3 containers that are doing a large amount of WAN activity. If left to run over a week this will consume all the available RAM and leave the system in a unstable state. My only work around is restarting docker regularly.
Cross link forum entry of many folks with the same vpnkit memory issue: https://forums.docker.com/t/vpnkit-uses-all-free-memory/48558/12 For me I suspect the behavior appeared with the 16762 build - never noticed this before but wasn't looking until it exhausted my memory for one simple nginx container.
I am having the same issue. Docker Version 18.03.0-ce-win59 (16762) Windows Server 2016 with 32GB of Memory Limit docker to 10gb of memory and vpnkit.exe uses up to 16GB in 24 hours. It either crashes or I have to restart Docker
I hate to post a me too but, me too: Docker Version 18.03.0-ce-mac60 (23751) Channel: stable 6ddfc0f1d3 OSX 10.11.6 (16GB Ram) Running one talkative (http outgoing requests only) app via docker compose
"me too" Left a couple (mostly idle) containers running over the weekend came back to 4GB used by vpnkit and a cranky system as that's what I had left..
Client: Version: 18.03.0-ce API version: 1.37 Go version: go1.9.4 Git commit: 0520e24 Built: Wed Mar 21 23:06:28 2018 OS/Arch: windows/amd64 Experimental: false Orchestrator: swarm
Server: Engine: Version: 18.03.0-ce API version: 1.37 (minimum version 1.12) Go version: go1.9.4 Git commit: 0520e24 Built: Wed Mar 21 23:14:32 2018 OS/Arch: linux/amd64 Experimental: false
We've rolled back to 17.12.0-ce-win47 2018-01-12 and are no longer seeing this issue.
Had an idea to compare versions so that we can isolate the vpnkit version that started failing.
PS C:\Program Files\Docker\Docker\Resources> .\vpnkit.exe --version
%%VERSION%%
PS C:\Program Files\Docker\Docker\Resources>
Sigh.
.\vpnkit.exe --debug --ethernet foo
will spit out a git sha:
PS C:\Program Files\Docker\Docker\Resources> .\vpnkit.exe --debug --ethernet xxx
vpnkit.exe: [INFO] Setting handler to ignore all SIGPIPE signals
vpnkit.exe: [INFO] Version is eb91fd8319abdfcaf87a1839e46b7ce0577b68fc
...
That's for the current Version 18.04.0-ce-rc2-win61 (17070). It corresponds to the most recent commit here.
@tsasioglu What does your 17.12.0-ce-win47 report?
Is there an easy way to roll back to 17.12.0-ce-win? As it is, 18 is completely unusable for me. I have to restart every 90 min because vpnkit uses +90% of my memory
@imarotte Yes, you can download 17.12.0-ce-win47 here: https://docs.docker.com/docker-for-windows/release-notes/#docker-community-edition-17120-ce-win47-2018-01-12. I rolled back two of our staff to that version yesterday. The steps I tool were:
C:\Users\Public\Documents\Hyper-V\Virtual hard disks\MobyLinuxVM.vhdx
and restore after the reinstall)docker
cmdlet wasn't working in PowershellThat is the simplest way, although you may end up losing your containers if the VM image is incompatible.
Thanks @imarotte for linking the issue from the docker tracker. When I've been watching mine, it's been rapidly climbing to around 1GB, on a 4GB machine (lol) then backing off, then reclimbing. There's a GIF watching memory usage in Task Man attached to the link in docker/for-win 1932. I have had the machine go completely unresponsive twice in the last week, though, which is highly abnormal, and could be due to disk thrashing caused by memory exhaustion. I haven't yet bumped my Docker version back, because I was hoping that this would get fixed rapidly, and I have enough problems with going forward through Docker versions, that I don't want to see what kind of Hellgate I can open by trying to go backwards. It definitely occurred when upgrading to 18.03, though. I don't know what version I had before, specifically, as I hadn't been paying attention, because I didn't have problems :-)
I do run a nginx in docker that redirects traffic to a few other containers as well as services that run natively on the bare metal. As an aside, I do intend to put more RAM in the box, but so far, it hasn't really presented a problem to me, except that I can't force a docker restart without logging out or rebooting first.
Just picked up the update to 18.03.1, and after about 15 minutes of runtime, vpnkit is hanging out at 13.7 to 14.0MB . . . so.. i'll keep an eye on it, but it seems to be fixed?
18.03.1 on Win 10 1803 after about 12 hours
I'm using Docker for Mac 18.03.1-ce-mac65 (24312) and hit the same issue after running vpnkit for ~5 hours:
Docker version:
Client:
Version: 18.03.1-ce
API version: 1.37
Go version: go1.9.5
Git commit: 9ee9f40
Built: Thu Apr 26 07:13:02 2018
OS/Arch: darwin/amd64
Experimental: false
Orchestrator: swarm
Server:
Engine:
Version: 18.03.1-ce
API version: 1.37 (minimum version 1.12)
Go version: go1.9.5
Git commit: 9ee9f40
Built: Thu Apr 26 07:22:38 2018
OS/Arch: linux/amd64
Experimental: false
Docker info:
Containers: 2
Running: 2
Paused: 0
Stopped: 0
Images: 28
Server Version: 18.03.1-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.87-linuxkit-aufs
Operating System: Docker for Mac
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 11.71GiB
Name: linuxkit-025000000001
ID: PWN5:BICM:VGBT:GP5H:DCER:EUBH:V2NB:JQSR:VU52:PNWG:XNQZ:VXPV
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 38
Goroutines: 54
System Time: 2018-05-02T22:08:32.337136965Z
EventsListeners: 2
HTTP Proxy: docker.for.mac.http.internal:3128
HTTPS Proxy: docker.for.mac.http.internal:3129
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Downgrading to 17.12 seems to fix it. Happy to provide any diagnostics.
Interesting: Here's a Diagnostic created from the Docker for Win UI:
574C77DB-7887-4242-889C-DFAE2C1F25FF/2018-05-02_21-22-45
... when I clicked Upload a Diagnostic, the usage started rapidly falling from 800MB down to 200MB, and then started rising again when it was done.
I uploaded a diagnostic: FB03AA8B-F86C-4E7A-87B8-3A05EA2E2E0D/2018-05-08_10-54-58
vpnkit was using 22, almost 23 GB of memory after running over the weekend.
saw someone was asking for docker-compose..
version: '3'
services:
mail:
restart: always
image: tvial/docker-mailserver:latest
hostname: mail
domainname: myemaildomain.com
container_name: mail
# network_mode: "host"
ports:
- "25:25"
- "143:143"
- "587:587"
- "993:993"
- "110:110"
- "995:995"
- "4190:4190"
volumes:
- maildata:/var/mail
- mailstate:/var/mail-state
- ./mail/config:/tmp/docker-mailserver/
- ./mail/config/sasl_passwd:/etc/postfix/sasl_passwd
- ./mail/config/sasl_passwd.db:/etc/postfix/sasl_passwd.db
- ./mail/config/eric.dovecot.sieve:/var/mail/myemaildomain.com/eric/.dovecot.sieve
environment:
- ENABLE_SPAMASSASSIN=1
- ENABLE_CLAMAV=1
- ENABLE_FAIL2BAN=0
- ENABLE_POSTGREY=1
- ONE_DIR=1
- DMS_DEBUG=0
- ENABLE_POP3=1
- ENABLE_MANAGESIEVE=1
cap_add:
- NET_ADMIN
- SYS_PTRACE
mysql4:
build: mysql4/mysql4
restart: always
ports:
- "3306:3306"
environment:
- MYSQL_ROOT_PASSWORD=mysqlrootpassword
nginx:
image: nginx:latest
restart: always
volumes:
- ./nginx/conf.d:/etc/nginx/conf.d
- d:\www:/usr/share/nginx/html
- certs:/etc/letsencrypt
- certs-data:/data/letsencrypt
ports:
- "80:80"
- "443:443"
environment:
- NGINX_HOST=myhostname.com
- NGINX_PORT=80
# let's build a phpbb
mariadb:
restart: always
image: 'bitnami/mariadb:latest'
environment:
- ALLOW_EMPTY_PASSWORD=yes
volumes:
- 'd:\docker\phpbb\mariadb:/bitnami'
networks:
- phpbb
phpbb:
restart: always
image: 'bitnami/phpbb:latest'
depends_on:
- mariadb
ports:
- '9999:80'
- '4443:443'
volumes:
- 'd:\docker\phpbb\phpbb2:/bitnami'
- 'd:\docker\phpbb\ext:/opt/bitnami/phpbb/ext'
- 'd:\docker\phpbb\styles:/opt/bitnami/phpbb/styles'
environment:
- ALLOW_EMPTY_PASSWORD=yes
- VIRTUAL_HOST=myhostname.com
- LETSENCRYPT_HOST=myhostname.com
- LETSENCRYPT_EMAIL=emailaddr@hostname.host
networks:
- phpbb
networks:
phpbb:
volumes:
maildata:
driver: local
mailstate:
driver: local
certs:
driver: local
certs-data:
driver: local
Do vpnkit devs have access to the uploaded Docker diagnostics? My diagnostics: 76AE1FF2-C84F-4D7A-9737-76A5F58F3D18/2018-05-09_09-10-29
Developers, "by default, a container has no resource constraints and can use as much of a given resource as the host's kernel scheduler allows". Through Docker, "you can control how much memory, CPU, even block IO a container can use". To limit a container's resources, you have to set "runtime configuration flags of the docker run command". To obtain additional information on setting CPU limits, and the features required to support Linux capabilities, visit 'Limit a container's resources', a Docker guide at https://docs.docker.com/config/containers/resource_constraints/.
Correct me if I am wrong, but to me it seems that vpnkit is a tool that aids the connection between vms/containers and the internet/vpns. This bug seems to have nothing to do with how much memory is used in containers.
If I am wrong however, vpnkit does use more memory than allotted in the advanced settings in docker.
Don't think you're wrong, Kanro, vpnkit seems to be well outside of the bounds of the docker containers themselves. I'm a bit surprised how long this particular problem has been out here, now, it's been a few weeks now, and it makes 18.03+ unusable for many people, and it seems that it's quite reproducible.
Unfortunately I've still not been able to reproduce this locally. If anyone has a self-contained example which causes the problem and which they are able to share, then I'd love to see it.
In the meantime, I've made a list of all the suspect builds of vpnkit on the issue on #385. If you have a good local repro which you can't easily share and have some time to help, could you try some of these earlier builds? I've added some instructions to the issue for the Mac (but windows should be similar). The idea is to start with a known-good Docker 17.12 and then to swap out the vpnkit
binary with later ones. If we can identify where the memory leak was introduced then I'll be able to track it down much more easily.
Thanks again for your reports!
I would probably start with, if there's not a "this can't work" problem, using the 17.12 vpnkit on the 18.03, to validate that the 17.12 vpnkit actually does work with 18.03 without the memory leak -- that would eliminate a change in docker as the likely trigger of a problem that was previously unknown. Then move the vpnkit.exe forward, until it breaks. I suppose I could sit down and do this over some time. How to grab just the vpnkit release binaries? i normally use the docker installer.
Possible fix - un-install / install
I recently had a failure of docker (wouldn't restart after boot) and couldn't find a way to repair and opted to uninstall and re-install a new fresh copy of Docker CE:
I have not seen the vpnkit grow beyond a few hundred meg since re-installation (ranging only from 222M to 232M). The prior install had been in place and upgrading for about a 12 to 18 months (ranged from 200M to over 9G on my 16G machine). I have been running this new installation for about a week and have not noticed any further memory issues.
200+MB doesn't sound normal. Is 200+MB normal?
@ericblade I don't know :-) The process uses a GC to free memory so I'd expect it to go up and down a bit. 200 MB sounds a little high, but if it's stable then that might be tolerable. If it's leaking then it'll keep getting bigger and bigger. I think it's safe to say that if it gets over 1G there must be a leak somewhere.
Well, on https://github.com/docker/for-win/issues/1932 a person said that they were seeing consistent 154MB on it on docker 17. What I'm seeing it do is climb to basically absorb all available memory, then drop significantly, then start climbing again. Occasionally, it seems to take the machine permanently out to lunch requiring a reboot, though. Thing is, until it started going crazy, I'd never paid any attention to it, so I have no idea what is "normal". I don't even know what the process does. I just know that it's really not normal to watch a process grow several MB per second, apparently only bounded by the amount of memory (including VRAM) in the system.
Perhaps as tcederquist says, it is something involved in an upgrade process. I don't really want to go messing around with a mostly functional installation, and risk breaking something, if I'm not actually helping to solve the problem... so I don't really want to go and do an uninstall and reinstall on it.. but perhaps to repro, you could try installing 17, run my docker-compose file (you can probably substitute a mysql4 image that's out there, instead of my custom one), then upgrade to 18.03 and see what happens? (maybe also you could then tell me then why my mail server refuses to follow the "restart: always" directive ;-) )
FYI as far as "typical" usage I'm currently seeing 0.8M with a peek of 1.6M today.. That's on Windows 10 Enterprise (1803 Build 17134.1) Docker version 18.03.1-ce, build 9ee9f40. When I first experienced the issue in question vpnkit was similarly growing to consume all available ram.. That got "better" with the last update I did (sorry don't have the exact version) meaning it was still using a lot of memory but in the hundreds of MB instead of GBs and then after the update that brought me to this version I haven't seen memory climb enough to register on my radar at all.
There's enough infrastructure involved with this installation that it's not entirely surprising that an "upgrade" might leave unfriendly components entangled.. The fact a full reinstall has helped some people follows.. I didn't have to go that far but further upgrades have fixed the problem for me which paired with a few *major windows patches so maybe I accomplished the same "house cleaning" with those?
So, if sub 2MB is "typical", then that would indicate that there might well be a problem in previous versions as well, a problem that somehow became unbounded in the more recent versions, causing us to all notice, whereas previously it was just sucking up 200MB, and who'd even notice that in age where machines having 16GB and 32GB is normal. :-D
Just noticed vpnkit.exe soared to 1.2G over the last few hours. It had been stable for several days after the new installation and even dropping to under 100M. Not sure the event that caused it, have not used the containers so it wasn't triggered by an event I am aware of. Appeared while the machine was completely idle. It's still short of the 6-8G of the old one but it is much larger than justified. It is currently stable at this size but not recovering either. Edit: 8 hours later - over 1.6G and still drifting slowly upward, and another hour later dropped to 75M. This process is busy doing something with memory. Nothing is going through my containers that I'm aware of and cpu usage for the process is very negligible (but it is doing something 0.1% cpu).
OK, so, had an interesting thing happen today. First off, I upgraded the machine that is having this problem, from 4GB RAM to a whole 6GB RAM (leave me alone, this machine is full of hand-me-down parts :-D ).
I noticed that the webserver (nginx container) was not responding to any requests to one particular service, after some time. I didn't check the other services, figuring that I probably needed to restart either the machine or the nginx container.
I logged into the machine, and to my surprise, the machine wasn't dead, nor was nginx. I pulled up Task Man, and vpnkit was using 1.2GB and climbing (it had previously stopped climbing at 700MB).
Now, the nginx container is forwarding incoming requests to several services, most of which run also in docker, but a couple that are running in Node on Windows. And that's where this gets interesting.
SO, seeing that the other services all seemed to be working, I went to restart the service that wasn't working -- a Windows node.js process. When I terminated the node.js process, VPNKit immediately dropped from 1.2+GB of usage to 212MB.
In the time that it took me to type this message, it has reclimbed to 518MB, and seems to be hanging on right around there.
@ericblade that's an interesting observation, thanks.
It's possible there's a memory leak in an error path -- I'll try to run some more tests locally to see if I can provoke it. So far I have managed to make the memory usage of vpnkit increase, but when I trigger a GC (in latest edge 18.05.0-ce-mac66
on Mac you can do this by kill -USR1 <vpnkit pid>
) it also drops to about 200 MiB. Perhaps the leak is being caused by waiting longer between garbage collects. One of the (innocuous-looking, completely-unrelated?) changes between 17.12 and 18.03 was a fix to the standard library to avoid too many garbage collects... perhaps we now have too few?
The latest edge build on Mac and Windows will trigger a full GC every 30 minutes, and log the results-- it's worth upgrading to that to see if it improves at all.
This happened after updating to 18.* branch. At this moment I have
18.03.0-ce-rc4-win57 (16511)
build and myvpnkit.exe
steals gigabytes of RAM in 2-3 hours with 2 containers running.Like this:
Or even like this:
I think it should never consume about 8 GBs of my RAM.