moby / vpnkit

A toolkit for embedding VPN capabilities in your application
Apache License 2.0
1.11k stars 188 forks source link

vpnkit.exe eating my RAM #371

Open soar opened 6 years ago

soar commented 6 years ago

This happened after updating to 18.* branch. At this moment I have 18.03.0-ce-rc4-win57 (16511) build and my vpnkit.exe steals gigabytes of RAM in 2-3 hours with 2 containers running.

Like this:

2018-03-25 18-00-20

Or even like this:

2018-03-24 16-10-41

I think it should never consume about 8 GBs of my RAM.

bartoncasey commented 6 years ago

I have this issue as well. vpnkit.exe claims memory in proportion to the amount of network activity into and out of the docker containers, and never releases it.

Edit: the latest version 18.03.0-ce-win58 (16761) may have resolved the issue. Fingers crossed.

soar commented 6 years ago

I've updated my Docker to 18.03.0-ce-win58 (16761) two hours ago - and problem is still here:

image

bartoncasey commented 6 years ago

1 day later it's sitting at 1.2G, from light traffic.

laarmen commented 6 years ago

We are several here at work having this exact issue.

djs55 commented 6 years ago

Thanks for your reports.

In order to make progress with this issue I need some reproduction steps. Could you provide a docker-compose.yml (or similar) and instructions to reproduce the problem?

laarmen commented 6 years ago

After trying a bit, it seems opening an HTTPS connection to a server on our internal network triggers the bug. The same doesn't apply to external, public servers (i.e. docker.com) nor other Docker instances.

djs55 commented 6 years ago

@laarmen thanks for the update. Could you trigger the bug and then upload a diagnostic report? I'd like to take a look at the logs.

laarmen commented 6 years ago

See https://github.com/laarmen/VpnKitPoC for the code. How can I do the diagnostic report thing?

djs55 commented 6 years ago

On Windows there should be a whale-shaped icon in the system tray. After right clicking on it there should be a menu item called something like "Diagnose and Feedback". Clicking on this should take you to a dialog where diagnostics are uploaded and assigned a unique id. If you quote the id in the ticket then I can download the logs and take a look.

(Sorry I couldn't give more precise instructions but I don't have a Windows machine to hand)

On Fri, Mar 30, 2018 at 2:19 PM, Simon Chopin notifications@github.com wrote:

See https://github.com/laarmen/VpnKitPoC for the code. How can I do the diagnostic report thing?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/moby/vpnkit/issues/371#issuecomment-377522486, or mute the thread https://github.com/notifications/unsubscribe-auth/AAMHul2HIQrFUI_nFNH_aLxIOT-jhgCTks5tjjDlgaJpZM4S6LDK .

laarmen commented 6 years ago

I'm not entirely sure this was an instance of the bug as I was still under the 300MB bar of RAM used by vpnkit, but it was consistently climbing. I'll upload another report if I get to the "eat-my-RAM" levels later.

ID: 5E3BFA7A-FF8F-4077-8583-773FF79518CC/2018-03-30_18-33-45

laarmen commented 6 years ago

In case that's useful, I just stopped all the docker containers on my workstation, waited a few minutes, and the vpnkit process sits at 700MB. I uploaded a second diagnosis, see 5E3BFA7A-FF8F-4077-8583-773FF79518CC/2018-03-30_19-09-21

laarmen commented 6 years ago

This time on my home computer and network, same code except that the target (on local network) is using plain http (no SSL), the memory grew to 1.5GB.

ID: D25DA2F3-2F67-42BA-A292-78A39BCBAEC4/2018-03-30_20-34-58

cnuernber commented 6 years ago

We are having same issue. This is happening to us in under one day (although we are using an app that generates a lot of network traffic). So we currently have to bounce docker once per day.

pastedimage

krukid commented 6 years ago

Same thing for me - I'm running a single Node.js process that downloads files from the web over HTTP (text and binary) - some 25-30K files, ~1GB in volume, about 100KB/s. VPNKit process consumes all available RAM within hours (I've had it consume up to 9GB of RAM, even though the overall limit for Docker itself is 2GB).

bazzilic commented 6 years ago

Same here. Win 10 x64, docker version:

Client:
 Version:       18.03.0-ce
 API version:   1.37
 Go version:    go1.9.4
 Git commit:    0520e24
 Built: Wed Mar 21 23:06:28 2018
 OS/Arch:       windows/amd64
 Experimental:  false
 Orchestrator:  swarm

Server:
 Engine:
  Version:      18.03.0-ce
  API version:  1.37 (minimum version 1.24)
  Go version:   go1.9.4
  Git commit:   0520e24
  Built:        Wed Mar 21 23:21:06 2018
  OS/Arch:      windows/amd64
  Experimental: true

vpnkit.exe currently at 7GB, constantly climbing:

image

0 containers running. Uploaded a diagnostic, id: 29542F91-6441-4210-934F-DB948F4EF0EF/2018-04-04_16-34-22

UPDATE 24/04/2018: As suggested below, adding vpnkit commit sha:

bazzilic@CSLRF21 $ & "C:\Program Files\Docker\Docker\resources\vpnkit.exe" --debug --ethernet foo
vpnkit.exe: [INFO] Setting handler to ignore all SIGPIPE signals
vpnkit.exe: [INFO] Version is 7c425f691978cb4a708ccc295dd331eae5cebc85
dsmaher commented 6 years ago

Here's a simple docker-compose.yaml that can reproduce the issue. If you watch memory usage on vpnkit.exe when this is running, it climbs by almost 1M every time the wget runs.

version: '3'
services:
  eat-memory:
    image: busybox
    entrypoint: sh
    command:
      - -c
      - |
        while true; do
          echo Getting docker.com...
          wget -qO/dev/null https://www.docker.com
          sleep 5
        done
bazzilic commented 6 years ago

In my case it climbs even if there’s no activity related to docker at all. At least, nothing explicit.

pbering commented 6 years ago

Even with all windows and linux containers stopped, the memory usage is constantly around 1.5 GB on my machine...

logich commented 6 years ago

I have this same issue occurring with 3 containers that are doing a large amount of WAN activity. If left to run over a week this will consume all the available RAM and leave the system in a unstable state. My only work around is restarting docker regularly.

tcederquist commented 6 years ago

Cross link forum entry of many folks with the same vpnkit memory issue: https://forums.docker.com/t/vpnkit-uses-all-free-memory/48558/12 For me I suspect the behavior appeared with the 16762 build - never noticed this before but wasn't looking until it exhausted my memory for one simple nginx container.

arnie311 commented 6 years ago

I am having the same issue. Docker Version 18.03.0-ce-win59 (16762) Windows Server 2016 with 32GB of Memory Limit docker to 10gb of memory and vpnkit.exe uses up to 16GB in 24 hours. snag_289fdd8a It either crashes or I have to restart Docker

jtownley commented 6 years ago

I hate to post a me too but, me too: Docker Version 18.03.0-ce-mac60 (23751) Channel: stable 6ddfc0f1d3 OSX 10.11.6 (16GB Ram) Running one talkative (http outgoing requests only) app via docker compose

omnipitous commented 6 years ago

"me too" Left a couple (mostly idle) containers running over the weekend came back to 4GB used by vpnkit and a cranky system as that's what I had left..

Client: Version: 18.03.0-ce API version: 1.37 Go version: go1.9.4 Git commit: 0520e24 Built: Wed Mar 21 23:06:28 2018 OS/Arch: windows/amd64 Experimental: false Orchestrator: swarm

Server: Engine: Version: 18.03.0-ce API version: 1.37 (minimum version 1.12) Go version: go1.9.4 Git commit: 0520e24 Built: Wed Mar 21 23:14:32 2018 OS/Arch: linux/amd64 Experimental: false

tsasioglu commented 6 years ago

We've rolled back to 17.12.0-ce-win47 2018-01-12 and are no longer seeing this issue.

bartoncasey commented 6 years ago

Had an idea to compare versions so that we can isolate the vpnkit version that started failing.

PS C:\Program Files\Docker\Docker\Resources> .\vpnkit.exe --version
%%VERSION%%
PS C:\Program Files\Docker\Docker\Resources>

Sigh.

bartoncasey commented 6 years ago

.\vpnkit.exe --debug --ethernet foo will spit out a git sha:

PS C:\Program Files\Docker\Docker\Resources> .\vpnkit.exe --debug --ethernet xxx
vpnkit.exe: [INFO] Setting handler to ignore all SIGPIPE signals
vpnkit.exe: [INFO] Version is eb91fd8319abdfcaf87a1839e46b7ce0577b68fc
...

That's for the current Version 18.04.0-ce-rc2-win61 (17070). It corresponds to the most recent commit here.

@tsasioglu What does your 17.12.0-ce-win47 report?

imarotte commented 6 years ago

Is there an easy way to roll back to 17.12.0-ce-win? As it is, 18 is completely unusable for me. I have to restart every 90 min because vpnkit uses +90% of my memory

apm963 commented 6 years ago

@imarotte Yes, you can download 17.12.0-ce-win47 here: https://docs.docker.com/docker-for-windows/release-notes/#docker-community-edition-17120-ce-win47-2018-01-12. I rolled back two of our staff to that version yesterday. The steps I tool were:

That is the simplest way, although you may end up losing your containers if the VM image is incompatible.

ericblade commented 6 years ago

Thanks @imarotte for linking the issue from the docker tracker. When I've been watching mine, it's been rapidly climbing to around 1GB, on a 4GB machine (lol) then backing off, then reclimbing. There's a GIF watching memory usage in Task Man attached to the link in docker/for-win 1932. I have had the machine go completely unresponsive twice in the last week, though, which is highly abnormal, and could be due to disk thrashing caused by memory exhaustion. I haven't yet bumped my Docker version back, because I was hoping that this would get fixed rapidly, and I have enough problems with going forward through Docker versions, that I don't want to see what kind of Hellgate I can open by trying to go backwards. It definitely occurred when upgrading to 18.03, though. I don't know what version I had before, specifically, as I hadn't been paying attention, because I didn't have problems :-)

I do run a nginx in docker that redirects traffic to a few other containers as well as services that run natively on the bare metal. As an aside, I do intend to put more RAM in the box, but so far, it hasn't really presented a problem to me, except that I can't force a docker restart without logging out or rebooting first.

ericblade commented 6 years ago

Just picked up the update to 18.03.1, and after about 15 minutes of runtime, vpnkit is hanging out at 13.7 to 14.0MB . . . so.. i'll keep an eye on it, but it seems to be fixed?

ericblade commented 6 years ago

image

18.03.1 on Win 10 1803 after about 12 hours

yixunx commented 6 years ago

I'm using Docker for Mac 18.03.1-ce-mac65 (24312) and hit the same issue after running vpnkit for ~5 hours: image

Docker version:

Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:13:02 2018
 OS/Arch:      darwin/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:22:38 2018
  OS/Arch:      linux/amd64
  Experimental: false

Docker info:

Containers: 2
 Running: 2
 Paused: 0
 Stopped: 0
Images: 28
Server Version: 18.03.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 773c489c9c1b21a6d78b5c538cd395416ec50f88
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.87-linuxkit-aufs
Operating System: Docker for Mac
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 11.71GiB
Name: linuxkit-025000000001
ID: PWN5:BICM:VGBT:GP5H:DCER:EUBH:V2NB:JQSR:VU52:PNWG:XNQZ:VXPV
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 38
 Goroutines: 54
 System Time: 2018-05-02T22:08:32.337136965Z
 EventsListeners: 2
HTTP Proxy: docker.for.mac.http.internal:3128
HTTPS Proxy: docker.for.mac.http.internal:3129
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Downgrading to 17.12 seems to fix it. Happy to provide any diagnostics.

ericblade commented 6 years ago

Interesting: Here's a Diagnostic created from the Docker for Win UI:

574C77DB-7887-4242-889C-DFAE2C1F25FF/2018-05-02_21-22-45

... when I clicked Upload a Diagnostic, the usage started rapidly falling from 800MB down to 200MB, and then started rising again when it was done.

Parktimo commented 6 years ago

I uploaded a diagnostic: FB03AA8B-F86C-4E7A-87B8-3A05EA2E2E0D/2018-05-08_10-54-58

vpnkit was using 22, almost 23 GB of memory after running over the weekend.

ericblade commented 6 years ago

saw someone was asking for docker-compose..

version: '3'
services:
  mail:
    restart: always
    image: tvial/docker-mailserver:latest
    hostname: mail
    domainname: myemaildomain.com
    container_name: mail
    # network_mode: "host"
    ports:
      - "25:25"
      - "143:143"
      - "587:587"
      - "993:993"
      - "110:110"
      - "995:995"
      - "4190:4190"
    volumes:
      - maildata:/var/mail
      - mailstate:/var/mail-state
      - ./mail/config:/tmp/docker-mailserver/
      - ./mail/config/sasl_passwd:/etc/postfix/sasl_passwd
      - ./mail/config/sasl_passwd.db:/etc/postfix/sasl_passwd.db
      - ./mail/config/eric.dovecot.sieve:/var/mail/myemaildomain.com/eric/.dovecot.sieve
    environment:
      - ENABLE_SPAMASSASSIN=1
      - ENABLE_CLAMAV=1
      - ENABLE_FAIL2BAN=0
      - ENABLE_POSTGREY=1
      - ONE_DIR=1
      - DMS_DEBUG=0
      - ENABLE_POP3=1
      - ENABLE_MANAGESIEVE=1
    cap_add:
      - NET_ADMIN
      - SYS_PTRACE

  mysql4:
    build: mysql4/mysql4
    restart: always
    ports:
      - "3306:3306"
    environment:
      - MYSQL_ROOT_PASSWORD=mysqlrootpassword

  nginx:
    image: nginx:latest
    restart: always
    volumes:
      - ./nginx/conf.d:/etc/nginx/conf.d
      - d:\www:/usr/share/nginx/html
      - certs:/etc/letsencrypt
      - certs-data:/data/letsencrypt
    ports:
      - "80:80"
      - "443:443"
    environment:
      - NGINX_HOST=myhostname.com
      - NGINX_PORT=80

  # let's build a phpbb
  mariadb:
    restart: always
    image: 'bitnami/mariadb:latest'
    environment:
      - ALLOW_EMPTY_PASSWORD=yes
    volumes:
      - 'd:\docker\phpbb\mariadb:/bitnami'
    networks:
      - phpbb
  phpbb:
    restart: always
    image: 'bitnami/phpbb:latest'
    depends_on:
      - mariadb
    ports:
      - '9999:80'
      - '4443:443'
    volumes:
      - 'd:\docker\phpbb\phpbb2:/bitnami'
      - 'd:\docker\phpbb\ext:/opt/bitnami/phpbb/ext'
      - 'd:\docker\phpbb\styles:/opt/bitnami/phpbb/styles'
    environment:
      - ALLOW_EMPTY_PASSWORD=yes
      - VIRTUAL_HOST=myhostname.com
      - LETSENCRYPT_HOST=myhostname.com
      - LETSENCRYPT_EMAIL=emailaddr@hostname.host
    networks:
      - phpbb

networks:
  phpbb:

volumes:
  maildata:
    driver: local
  mailstate:
    driver: local
  certs:
    driver: local
  certs-data:
    driver: local
earthgrazer commented 6 years ago

Do vpnkit devs have access to the uploaded Docker diagnostics? My diagnostics: 76AE1FF2-C84F-4D7A-9737-76A5F58F3D18/2018-05-09_09-10-29

YRM64 commented 6 years ago

Developers, "by default, a container has no resource constraints and can use as much of a given resource as the host's kernel scheduler allows". Through Docker, "you can control how much memory, CPU, even block IO a container can use". To limit a container's resources, you have to set "runtime configuration flags of the docker run command". To obtain additional information on setting CPU limits, and the features required to support Linux capabilities, visit 'Limit a container's resources', a Docker guide at https://docs.docker.com/config/containers/resource_constraints/.

Kanro-Code commented 6 years ago

Correct me if I am wrong, but to me it seems that vpnkit is a tool that aids the connection between vms/containers and the internet/vpns. This bug seems to have nothing to do with how much memory is used in containers.

If I am wrong however, vpnkit does use more memory than allotted in the advanced settings in docker.

ericblade commented 6 years ago

Don't think you're wrong, Kanro, vpnkit seems to be well outside of the bounds of the docker containers themselves. I'm a bit surprised how long this particular problem has been out here, now, it's been a few weeks now, and it makes 18.03+ unusable for many people, and it seems that it's quite reproducible.

djs55 commented 6 years ago

Unfortunately I've still not been able to reproduce this locally. If anyone has a self-contained example which causes the problem and which they are able to share, then I'd love to see it.

In the meantime, I've made a list of all the suspect builds of vpnkit on the issue on #385. If you have a good local repro which you can't easily share and have some time to help, could you try some of these earlier builds? I've added some instructions to the issue for the Mac (but windows should be similar). The idea is to start with a known-good Docker 17.12 and then to swap out the vpnkit binary with later ones. If we can identify where the memory leak was introduced then I'll be able to track it down much more easily.

Thanks again for your reports!

ericblade commented 6 years ago

I would probably start with, if there's not a "this can't work" problem, using the 17.12 vpnkit on the 18.03, to validate that the 17.12 vpnkit actually does work with 18.03 without the memory leak -- that would eliminate a change in docker as the likely trigger of a problem that was previously unknown. Then move the vpnkit.exe forward, until it breaks. I suppose I could sit down and do this over some time. How to grab just the vpnkit release binaries? i normally use the docker installer.

tcederquist commented 6 years ago

Possible fix - un-install / install

I recently had a failure of docker (wouldn't restart after boot) and couldn't find a way to repair and opted to uninstall and re-install a new fresh copy of Docker CE:

I have not seen the vpnkit grow beyond a few hundred meg since re-installation (ranging only from 222M to 232M). The prior install had been in place and upgrading for about a 12 to 18 months (ranged from 200M to over 9G on my 16G machine). I have been running this new installation for about a week and have not noticed any further memory issues.

ericblade commented 6 years ago

200+MB doesn't sound normal. Is 200+MB normal?

djs55 commented 6 years ago

@ericblade I don't know :-) The process uses a GC to free memory so I'd expect it to go up and down a bit. 200 MB sounds a little high, but if it's stable then that might be tolerable. If it's leaking then it'll keep getting bigger and bigger. I think it's safe to say that if it gets over 1G there must be a leak somewhere.

ericblade commented 6 years ago

Well, on https://github.com/docker/for-win/issues/1932 a person said that they were seeing consistent 154MB on it on docker 17. What I'm seeing it do is climb to basically absorb all available memory, then drop significantly, then start climbing again. Occasionally, it seems to take the machine permanently out to lunch requiring a reboot, though. Thing is, until it started going crazy, I'd never paid any attention to it, so I have no idea what is "normal". I don't even know what the process does. I just know that it's really not normal to watch a process grow several MB per second, apparently only bounded by the amount of memory (including VRAM) in the system.

Perhaps as tcederquist says, it is something involved in an upgrade process. I don't really want to go messing around with a mostly functional installation, and risk breaking something, if I'm not actually helping to solve the problem... so I don't really want to go and do an uninstall and reinstall on it.. but perhaps to repro, you could try installing 17, run my docker-compose file (you can probably substitute a mysql4 image that's out there, instead of my custom one), then upgrade to 18.03 and see what happens? (maybe also you could then tell me then why my mail server refuses to follow the "restart: always" directive ;-) )

omnipitous commented 6 years ago

FYI as far as "typical" usage I'm currently seeing 0.8M with a peek of 1.6M today.. That's on Windows 10 Enterprise (1803 Build 17134.1) Docker version 18.03.1-ce, build 9ee9f40. When I first experienced the issue in question vpnkit was similarly growing to consume all available ram.. That got "better" with the last update I did (sorry don't have the exact version) meaning it was still using a lot of memory but in the hundreds of MB instead of GBs and then after the update that brought me to this version I haven't seen memory climb enough to register on my radar at all.

There's enough infrastructure involved with this installation that it's not entirely surprising that an "upgrade" might leave unfriendly components entangled.. The fact a full reinstall has helped some people follows.. I didn't have to go that far but further upgrades have fixed the problem for me which paired with a few *major windows patches so maybe I accomplished the same "house cleaning" with those?

ericblade commented 6 years ago

So, if sub 2MB is "typical", then that would indicate that there might well be a problem in previous versions as well, a problem that somehow became unbounded in the more recent versions, causing us to all notice, whereas previously it was just sucking up 200MB, and who'd even notice that in age where machines having 16GB and 32GB is normal. :-D

tcederquist commented 6 years ago

Just noticed vpnkit.exe soared to 1.2G over the last few hours. It had been stable for several days after the new installation and even dropping to under 100M. Not sure the event that caused it, have not used the containers so it wasn't triggered by an event I am aware of. Appeared while the machine was completely idle. It's still short of the 6-8G of the old one but it is much larger than justified. It is currently stable at this size but not recovering either. Edit: 8 hours later - over 1.6G and still drifting slowly upward, and another hour later dropped to 75M. This process is busy doing something with memory. Nothing is going through my containers that I'm aware of and cpu usage for the process is very negligible (but it is doing something 0.1% cpu).

ericblade commented 6 years ago

OK, so, had an interesting thing happen today. First off, I upgraded the machine that is having this problem, from 4GB RAM to a whole 6GB RAM (leave me alone, this machine is full of hand-me-down parts :-D ).

I noticed that the webserver (nginx container) was not responding to any requests to one particular service, after some time. I didn't check the other services, figuring that I probably needed to restart either the machine or the nginx container.

I logged into the machine, and to my surprise, the machine wasn't dead, nor was nginx. I pulled up Task Man, and vpnkit was using 1.2GB and climbing (it had previously stopped climbing at 700MB).

Now, the nginx container is forwarding incoming requests to several services, most of which run also in docker, but a couple that are running in Node on Windows. And that's where this gets interesting.

SO, seeing that the other services all seemed to be working, I went to restart the service that wasn't working -- a Windows node.js process. When I terminated the node.js process, VPNKit immediately dropped from 1.2+GB of usage to 212MB.

In the time that it took me to type this message, it has reclimbed to 518MB, and seems to be hanging on right around there.

djs55 commented 6 years ago

@ericblade that's an interesting observation, thanks.

It's possible there's a memory leak in an error path -- I'll try to run some more tests locally to see if I can provoke it. So far I have managed to make the memory usage of vpnkit increase, but when I trigger a GC (in latest edge 18.05.0-ce-mac66 on Mac you can do this by kill -USR1 <vpnkit pid>) it also drops to about 200 MiB. Perhaps the leak is being caused by waiting longer between garbage collects. One of the (innocuous-looking, completely-unrelated?) changes between 17.12 and 18.03 was a fix to the standard library to avoid too many garbage collects... perhaps we now have too few?

The latest edge build on Mac and Windows will trigger a full GC every 30 minutes, and log the results-- it's worth upgrading to that to see if it improves at all.