qdm12 / gluetun

VPN client in a thin Docker container for multiple VPN providers, written in Go, and using OpenVPN or Wireguard, DNS over TLS, with a few proxy servers built-in.
https://hub.docker.com/r/qmcgaw/gluetun
MIT License
8.04k stars 371 forks source link

Bug: Server keep restarting #2066

Closed clpir3s closed 9 months ago

clpir3s commented 10 months ago

Is this urgent?

No

Host OS

ubuntu

CPU arch

aarch64

VPN service provider

Surfshark

What are you using to run the container

docker-compose

What is the version of Gluetun

v3.36.0

What's the problem 🤔

Hi all,

I already setup multiple server (in order to have many Public IP), I'm not sure if this is related to the number of server I'm using.

Can you help me if I'm missing some config?

Share your logs (at least 10 lines)

fr-proxyserver-4  | 2024-01-22T17:10:32Z INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o        timeout errors indicate the Wireguard connection is not working.
fr-proxyserver-1  | 2024-01-22T17:10:32Z INFO [healthcheck] unhealthy: dialing: dial tcp4: lookup cloudflare.com: i/o timeout
fr-proxyserver-1  | 2024-01-22T17:10:33Z INFO [healthcheck] healthy!
fr-proxyserver-3  | 2024-01-22T17:10:38Z INFO [healthcheck] program has been unhealthy for 41s: restarting VPN (see https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md)
fr-proxyserver-3  | 2024-01-22T17:10:38Z INFO [vpn] stopping
fr-proxyserver-3  | 2024-01-22T17:10:38Z INFO [vpn] starting
fr-proxyserver-3  | 2024-01-22T17:10:38Z INFO [firewall] allowing VPN connection...
fr-proxyserver-3  | 2024-01-22T17:10:38Z INFO [wireguard] Using available kernelspace implementation
fr-proxyserver-3  | 2024-01-22T17:10:38Z INFO [wireguard] Connecting to 45.134.79.166:51820
fr-proxyserver-3  | 2024-01-22T17:10:38Z INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o        timeout errors indicate the Wireguard connection is not working.
fr-proxyserver-4  | 2024-01-22T17:10:42Z ERROR [ip getter] Get "https://ipinfo.io/": dial tcp: lookup ipinfo.io on 1.1.1.1:53: read udp 10.14.0.2:47797->1.1.1.1:53: i/o timeout - retrying in 5s
fr-proxyserver-4  | 2024-01-22T17:10:43Z INFO [healthcheck] program has been unhealthy for 11s: restarting VPN (see https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md)
fr-proxyserver-4  | 2024-01-22T17:10:43Z INFO [vpn] stopping
fr-proxyserver-4  | 2024-01-22T17:10:43Z INFO [vpn] starting
fr-proxyserver-4  | 2024-01-22T17:10:43Z INFO [firewall] allowing VPN connection...
fr-proxyserver-4  | 2024-01-22T17:10:43Z INFO [wireguard] Using available kernelspace implementation
fr-proxyserver-4  | 2024-01-22T17:10:43Z INFO [wireguard] Connecting to 85.204.70.105:51820
fr-proxyserver-4  | 2024-01-22T17:10:43Z INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o        timeout errors indicate the Wireguard connection is not working.
fr-proxyserver-5  | 2024-01-22T17:10:48Z INFO [healthcheck] unhealthy: dialing: dial tcp4: lookup cloudflare.com: i/o timeout
fr-proxyserver-5  | 2024-01-22T17:10:56Z INFO [healthcheck] program has been unhealthy for 6s: restarting VPN (see https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md)
fr-proxyserver-5  | 2024-01-22T17:10:56Z INFO [vpn] stopping
fr-proxyserver-5  | 2024-01-22T17:10:56Z INFO [vpn] starting
fr-proxyserver-5  | 2024-01-22T17:10:56Z INFO [firewall] allowing VPN connection...
fr-proxyserver-5  | 2024-01-22T17:10:56Z INFO [wireguard] Using available kernelspace implementation
fr-proxyserver-5  | 2024-01-22T17:10:56Z INFO [wireguard] Connecting to 85.204.70.105:51820
fr-proxyserver-5  | 2024-01-22T17:10:56Z INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o        timeout errors indicate the Wireguard connection is not working.
fr-proxyserver-4  | 2024-01-22T17:10:57Z ERROR [ip getter] Get "https://ipinfo.io/": dial tcp: lookup ipinfo.io on 1.1.1.1:53: read udp 10.14.0.2:52811->1.1.1.1:53: i/o timeout - retrying in 10s
fr-proxyserver-4  | 2024-01-22T17:10:59Z INFO [healthcheck] program has been unhealthy for 16s: restarting VPN (see https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md)
fr-proxyserver-4  | 2024-01-22T17:10:59Z INFO [vpn] stopping
fr-proxyserver-4  | 2024-01-22T17:11:00Z INFO [vpn] starting
fr-proxyserver-4  | 2024-01-22T17:11:00Z INFO [firewall] allowing VPN connection...
fr-proxyserver-4  | 2024-01-22T17:11:00Z INFO [wireguard] Using available kernelspace implementation
fr-proxyserver-4  | 2024-01-22T17:11:00Z INFO [wireguard] Connecting to 146.70.18.91:51820
fr-proxyserver-4  | 2024-01-22T17:11:00Z INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o        timeout errors indicate the Wireguard connection is not working.
fr-proxyserver-3  | 2024-01-22T17:11:04Z ERROR [ip getter] Get "https://ipinfo.io/": dial tcp: lookup ipinfo.io on 1.1.1.1:53: read udp 10.14.0.2:54817->1.1.1.1:53: i/o timeout - retrying in 2m40s
fr-proxyserver-5  | 2024-01-22T17:11:06Z ERROR [ip getter] Get "https://ipinfo.io/": dial tcp: lookup ipinfo.io on 1.1.1.1:53: read udp 10.14.0.2:35512->1.1.1.1:53: i/o timeout - retrying in 5s
fr-proxyserver-5  | 2024-01-22T17:11:07Z INFO [healthcheck] program has been unhealthy for 11s: restarting VPN (see https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md)
fr-proxyserver-5  | 2024-01-22T17:11:07Z INFO [vpn] stopping
fr-proxyserver-5  | 2024-01-22T17:11:08Z INFO [vpn] starting
fr-proxyserver-5  | 2024-01-22T17:11:08Z INFO [firewall] allowing VPN connection...
fr-proxyserver-5  | 2024-01-22T17:11:08Z INFO [wireguard] Using available kernelspace implementation
fr-proxyserver-5  | 2024-01-22T17:11:08Z INFO [wireguard] Connecting to 85.204.70.93:51820
fr-proxyserver-5  | 2024-01-22T17:11:08Z INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o        timeout errors indicate the Wireguard connection is not working.
fr-proxyserver-4  | 2024-01-22T17:11:17Z ERROR [ip getter] Get "https://ipinfo.io/": dial tcp: lookup ipinfo.io on 1.1.1.1:53: read udp 10.14.0.2:37226->1.1.1.1:53: i/o timeout - retrying in 20s
fr-proxyserver-5  | 2024-01-22T17:11:21Z ERROR [ip getter] Get "https://ipinfo.io/": dial tcp: lookup ipinfo.io on 1.1.1.1:53: read udp 10.14.0.2:40054->1.1.1.1:53: i/o timeout - retrying in 10s
fr-proxyserver-4  | 2024-01-22T17:11:23Z INFO [healthcheck] program has been unhealthy for 21s: restarting VPN (see https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md)
fr-proxyserver-4  | 2024-01-22T17:11:23Z INFO [vpn] stopping
fr-proxyserver-4  | 2024-01-22T17:11:23Z INFO [vpn] starting
fr-proxyserver-4  | 2024-01-22T17:11:23Z INFO [firewall] allowing VPN connection...
fr-proxyserver-4  | 2024-01-22T17:11:23Z INFO [wireguard] Using available kernelspace implementation
fr-proxyserver-4  | 2024-01-22T17:11:23Z INFO [wireguard] Connecting to 143.244.57.81:51820
fr-proxyserver-4  | 2024-01-22T17:11:23Z INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o        timeout errors indicate the Wireguard connection is not working.
fr-proxyserver-5  | 2024-01-22T17:11:24Z INFO [healthcheck] program has been unhealthy for 16s: restarting VPN (see https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md)
fr-proxyserver-5  | 2024-01-22T17:11:24Z INFO [vpn] stopping
fr-proxyserver-5  | 2024-01-22T17:11:24Z INFO [vpn] starting
fr-proxyserver-5  | 2024-01-22T17:11:24Z INFO [firewall] allowing VPN connection...
fr-proxyserver-5  | 2024-01-22T17:11:24Z INFO [wireguard] Using available kernelspace implementation
fr-proxyserver-5  | 2024-01-22T17:11:24Z INFO [wireguard] Connecting to 194.110.113.227:51820
fr-proxyserver-5  | 2024-01-22T17:11:24Z INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o        timeout errors indicate the Wireguard connection is not working.
fr-proxyserver-3  | 2024-01-22T17:11:25Z INFO [healthcheck] program has been unhealthy for 46s: restarting VPN (see https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md)
fr-proxyserver-3  | 2024-01-22T17:11:25Z INFO [vpn] stopping
fr-proxyserver-3  | 2024-01-22T17:11:25Z INFO [vpn] starting
fr-proxyserver-3  | 2024-01-22T17:11:25Z INFO [firewall] allowing VPN connection...
fr-proxyserver-3  | 2024-01-22T17:11:25Z INFO [wireguard] Using available kernelspace implementation
fr-proxyserver-3  | 2024-01-22T17:11:25Z INFO [wireguard] Connecting to 146.70.194.227:51820
fr-proxyserver-3  | 2024-01-22T17:11:25Z INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o        timeout errors indicate the Wireguard connection is not working.
fr-proxyserver-5  | 2024-01-22T17:11:41Z ERROR [ip getter] Get "https://ipinfo.io/": dial tcp: lookup ipinfo.io on 1.1.1.1:53: read udp 10.14.0.2:58582->1.1.1.1:53: i/o timeout - retrying in 20s
fr-proxyserver-5  | 2024-01-22T17:11:47Z INFO [healthcheck] program has been unhealthy for 21s: restarting VPN (see https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md)
fr-proxyserver-5  | 2024-01-22T17:11:47Z INFO [vpn] stopping
fr-proxyserver-5  | 2024-01-22T17:11:47Z INFO [vpn] starting
fr-proxyserver-5  | 2024-01-22T17:11:47Z INFO [firewall] allowing VPN connection...
fr-proxyserver-5  | 2024-01-22T17:11:47Z INFO [wireguard] Using available kernelspace implementation
fr-proxyserver-5  | 2024-01-22T17:11:47Z INFO [wireguard] Connecting to 185.166.84.137:51820
fr-proxyserver-5  | 2024-01-22T17:11:47Z INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o timeout errors indicate the Wireguard connection is not working.
fr-proxyserver-4  | 2024-01-22T17:11:47Z ERROR [ip getter] Get "https://ipinfo.io/": dial tcp: lookup ipinfo.io on 1.1.1.1:53: read udp 10.14.0.2:42960->1.1.1.1:53: i/o timeout - retrying in 40s
fr-proxyserver-4  | 2024-01-22T17:11:50Z INFO [healthcheck] program has been unhealthy for 26s: restarting VPN (see https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md)
fr-proxyserver-4  | 2024-01-22T17:11:50Z INFO [vpn] stopping
fr-proxyserver-4  | 2024-01-22T17:11:50Z INFO [vpn] starting
fr-proxyserver-4  | 2024-01-22T17:11:50Z INFO [firewall] allowing VPN connection...
fr-proxyserver-4  | 2024-01-22T17:11:50Z INFO [wireguard] Using available kernelspace implementation
fr-proxyserver-4  | 2024-01-22T17:11:50Z INFO [wireguard] Connecting to 194.110.113.227:51820
fr-proxyserver-4  | 2024-01-22T17:11:50Z INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o timeout errors indicate the Wireguard connection is not working.

Share your configuration

base.yml
version: "3.9"
services:
  vpn:
    image: qmcgaw/gluetun:v3.36.0
    restart: always
    devices:
      - /dev/net/tun:/dev/net/tun
    cap_add:
      - NET_ADMIN
      - SYS_MODULE
    #ports:
    #  - 8888:8888/tcp # HTTP proxy
    environment:
      - /lib/modules:/lib/modules:ro
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
      - VPN_SERVICE_PROVIDER=surfshark
      - VPN_TYPE=wireguard
      - WIREGUARD_PRIVATE_KEY=**********
      - WIREGUARD_ADDRESSES=10.14.0.2/16
    # System
      - PUID=1030
      - PGID=101
      - TZ=Europe/Lisbon
    # HTTPproxy
      - HTTPPROXY=on
      - HTTPPROXY_LOG=on
    # HTTP Control server
      - HTTPPROXY_LISTENING_ADDRESS=:8888
      - HTTPPROXY_STEALTH=on
    # DNS over TLS
      - DOT=off
      - DOT_IPV6=on
    # Other
      - PUBLICIP_PERIOD=720h

  fr-proxyserver:
    extends:
      file: ./base.yml
      service: vpn   
    environment:
      - SERVER_COUNTRIES=France

fr.yml (ony the ip and name is diferente)
version: "3.9"

services:
  #          
  fr-proxyserver-1:
    extends:
      file: ./../base.yml
      service: fr-proxyserver
    container_name: fr-proxyserver-1  
    networks:
      vpn:
        ipv4_address: 10.0.1.200
mkozikowsk commented 10 months ago

I am having a similar problem https://github.com/qdm12/gluetun/issues/2072

gjrtimmer commented 9 months ago

Same problem, but I have an idea why, investigating....

gjrtimmer commented 9 months ago

I found the issue; I tried to post my entire deduction because everybody's network and setup are different, so I might need more than posting a solution.

Here is what you can do and how I solved it.

The log shows that WireGuard keeps trying to connect, but it never succeeds. You never see in your log that it is connected and what your new public IP is. You also see an error for a timeout of ipinfo.

So logic: Timeout for ipinfo == CORRECT; because there is no established VPN connection.

BTW: I have a similar setup; I'm trying to run it as a sidecar on a nomad cluster. This cluster also runs consul and vault. Important to know about my setup is that I have a custom dnmasq running on every node which is accessible on the IP of the node to make sure that my entire network can resolve *.service.consul services. Important about this side node is that in my /etc/resolv.conf I have nameserver 192.168.0.{NODE_IP} Instead of 127.0.0.7.

The side note I gave above, combined with the logging except for different VPN providers, is identical to this issue; it led me to think I might have a DNS issue or there is a firewall issue within the gluten container, and it is unable to resolve the IP of the VPN node.

Steps:

  1. Turn on firewall debug FIREWALL_DEBUG=on
  2. Add the following options: (Reason: You have a similar setup because you turned off DOT)
    • DOT=off
    • DNS_KEEP_NAMESERVER=on
  3. this two option would result in the following configuration: basic DNS container setup like any other on your system with the exception that it has more advanced internal firewall rulings because of the gluetun killswitch, etc.
  4. Now by turning off, DOT and keeping your original DNS Nameserver, spin un the container and login into a shell of it with docker exec
  5. Once you are in the shell, it does not matter, if the VPN is connected or not, because you kept your original DNS nameserver, inside the container it should be able to resolve addresses
  6. try ping google.com and see if it is working or not
  7. In my case, it did not work,, which meant there is aa DNS/firewall problem.
  8. If you have the same and ping does not return anything, you might have the same issue.
  9. Because your setup is different than mine, there are one solution I can give you which might solve it.
    • Add additional iptables rules to /iptables/post-rules.txt depending on what your actual problem is
    • The solution that fixed it for me was I added the option FIREWALL_OUTBOUND_SUBNETS=192.168.0.0./24 to trust my entire local network, which solved the DNS and firewall rules, after recreating the container with the option VPN was connected in about 3 sec.

I apologize for the lengthy read, because everybody's network setup can be so vastly different, I wrote this entire thing to guide everybody who reads this through the logical steps. Just take it one step at a time and eliminate issues until you find the root case.

@clpir3s @mkozikowsk Hope this helps

To summarize, it looks like your container has some config which causes the firewall to block the outbound connection to the VPN Providers VPN Node.

N367B commented 9 months ago

I just fixed it but adding the following environment variable : FIREWALL_OUTBOUND_SUBNETS=0.0.0.0/0. Does this cause any security issues?

mkozikowsk commented 9 months ago

@gjrtimmer, @N367B In my case, it seems that DNS is also not working. Adding env variable FIREWALL_OUTBOUND_SUBNETS=192.168.0.0./24 or even 0.0.0.0/0 doesn't fix the issue. image

On my container, adding this causes another error:

torrent-gluetun-1  | 2024-01-25T13:53:42Z INFO [wireguard] Connecting to 37.19.221.156:51820
torrent-gluetun-1  | 2024-01-25T13:53:42Z INFO [wireguard] Wireguard is up
torrent-gluetun-1  | 2024-01-25T13:53:42Z INFO [healthcheck] healthy!
torrent-gluetun-1  | 2024-01-25T13:53:42Z INFO [dns over tls] downloading DNS over TLS cryptographic files
torrent-gluetun-1  | 2024-01-25T13:53:44Z INFO [dns over tls] downloading hostnames and IP block lists
torrent-gluetun-1  | 2024-01-25T13:53:49Z INFO [dns over tls] init module 0: validator
torrent-gluetun-1  | 2024-01-25T13:53:49Z INFO [dns over tls] init module 1: iterator
torrent-gluetun-1  | 2024-01-25T13:53:49Z INFO [dns over tls] start of service (unbound 1.17.1).
torrent-gluetun-1  | 2024-01-25T13:53:50Z INFO [healthcheck] unhealthy: dialing: dial tcp4: lookup cloudflare.com: i/o timeout
torrent-gluetun-1  | 2024-01-25T13:53:58Z INFO [healthcheck] program has been unhealthy for 6s: restarting VPN (see https://github.com/qdm12/gluetun/wiki/Healthcheck)
torrent-gluetun-1  | 2024-01-25T13:53:58Z ERROR [vpn] cannot get version information: Get "https://api.github.com/repos/qdm12/gluetun/releases": context canceled
clpir3s commented 9 months ago

Thanks guys,

@gjrtimmer, For me it looks like "temporary" issue (load on the servers), because sometimes it works fine. I have seen this issues happens when is also difficult to connect using the official surfshark, app also in PC and Mobile.

It looks like the server is is in trouble to give a new server to connect. But its strange because @mkozikowsk is using another provider.

But, I already first tried with FIREWALL_OUTBOUND_SUBNETS=192.168.0.0/24 and then FIREWALL_OUTBOUND_SUBNETS=0.0.0.0/0.

With the last FIREWALL_OUTBOUND_SUBNETS=0.0.0.0/0 is seams more stable, and fast to create the container and be ready.

I'll keep seeing if this variable FIREWALL_OUTBOUND_SUBNETS=0.0.0.0/0 solve it. If I see this error I will test on the mobile at the same time, because right now its fasted connect also.

clpir3s commented 9 months ago

Update,

@N367B I already check that when I use this FIREWALL_OUTBOUND_SUBNETS=0.0.0.0/0, I'm not in tunel, I'm seeing my privider ip on the other side instead of a vpn ip.

also the logs: 2024-01-25T16:45:44Z INFO [ip getter] Public IP address is ()

N367B commented 9 months ago

It's probably that, but at least it's not blocking traffic. Hoping for a real solution soon.

cahuizar commented 9 months ago

I also started running into this issue with a similar wireguard setup. Everything was working fine for the last few months. Hoping a fix can be made 🙏

clpir3s commented 9 months ago

Hi guys,

I some changes in my configuration, it still stable for more than one 1 day.

What I do.

after that it was giving me this error [healthcheck] unhealthy: cannot dial: dial tcp4 cloudflare:443: i/o timeout

I added this config - HEALTH_TARGET_ADDRESS=1.1.1.1:443 https://github.com/qdm12/gluetun/discussions/929

leo15dev commented 9 months ago

try

- DOT=off
- DNS_ADDRESS=8.8.8.8

Sometimes it's a Cloudflare DNS problem, try using a different DNS and you might be able to solve it.

qdm12 commented 9 months ago

Hello everyone... so a lot of funny things going on here...

The main point is in the logs; Typically i/o timeout errors indicate the Wireguard connection is not working. And then the other log line program has been unhealthy for 41s: restarting VPN (see https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md) I suggest you read https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md

I just fixed it but adding the following environment variable : FIREWALL_OUTBOUND_SUBNETS=0.0.0.0/0. Does this cause any security issues?

Yes. This basically allows your traffic to go out of the VPN totally, so yeah nothing will fail obviously, since it doesn't go through the VPN if it doesn't work. Don't do it. I'll probably codify this.

Sometimes it's a Cloudflare DNS problem, try using a different DNS and you might be able to solve it.

No, it's the VPN server. You can always check on your non-VPN machine that cloudflare DNS works.

DOT=off

As a reminder, this leaks DNS traffic to the VPN provider, which I would ideally not trust personally.

Hoping a fix can be made / Hoping for a real solution soon.

Read https://github.com/qdm12/gluetun-wiki/blob/main/faq/healthcheck.md If nothing works, change VPN provider 😞

Closing this since this is going in all directions, without useful information (i.e. what provider are you using? What server hostname? etc.)

qdm12 commented 9 months ago

6b9c775055d9a1712f60d1980f1d9c542456bc9b prevents using public subnets in FIREWALL_OUTBOUND_SUBNETS. Better have a safe not working Gluetun than an unsafe leaking "working" Gluetun.

mkozikowsk commented 9 months ago

In my case it was not a problem with the gluetun configuration, but the firewall on my router :|