Bug: health check flips between unhealthy and healthy with no logs

giorgiooriani commented 6 months ago

Is this urgent?

No

Host OS

Synology DSM 7.2

CPU arch

x86_64

VPN service provider

Windscribe

What are you using to run the container

docker-compose

What is the version of Gluetun

Running version latest built on 2024-05-04T16:22:29.394Z (commit ef6874f)

What's the problem 🤔

Mentioned this in issue #2154 but thought it would be better to open a new issue.

The container keeps flipping between healthy and unhealthy when I am downloading a torrent at a high bandwidth (80-100MiB/s). The logs don't seem to update at all beyond the initial healthcheck which shows healthy, even in debug mode so I can't figure out why.

It doesn't seem to be an interruption of the VPN since in the background the torrent downloads fine, except for when the VPN is restarted due to the healthcheck. The torrent is interrupted for a few seconds while vpn restarts, and then works perfectly fine, over and over. I don't know why the healthcheck is flagging as unhealthy when it seems to be working as intended. For example if I am seeding, even for hours at lower bandwidths (1-2 MiB/s), it doesn't seem to make the container unhealthy.

My internet connection is fine and I don't have a firewall setup, so I am not sure what is causing this behaviour.

Share your logs (at least 10 lines)

========================================
========================================
=============== gluetun ================
========================================
=========== Made with ❤️ by ============
======= https://github.com/qdm12 =======
========================================
========================================
Running version latest built on 2024-05-04T16:22:29.394Z (commit ef6874f)
🔧 Need help? https://github.com/qdm12/gluetun/discussions/new
🐛 Bug? https://github.com/qdm12/gluetun/issues/new
✨ New feature? https://github.com/qdm12/gluetun/issues/new
☕ Discussion? https://github.com/qdm12/gluetun/discussions/new
💻 Email? quentin.mcgaw@gmail.com
💰 Help me? https://www.paypal.me/qmcgaw https://github.com/sponsors/qdm12
2024-05-07T13:00:09+02:00 INFO [routing] default route found: interface eth0, gateway 10.0.5.1, assigned IP 10.0.5.15 and family v4
2024-05-07T13:00:09+02:00 INFO [routing] local ethernet link found: eth0
2024-05-07T13:00:09+02:00 INFO [routing] local ipnet found: 10.0.5.0/24
2024-05-07T13:00:10+02:00 INFO [firewall] enabling...
2024-05-07T13:00:10+02:00 DEBUG [firewall] iptables-legacy --policy INPUT DROP
2024-05-07T13:00:10+02:00 DEBUG [firewall] iptables-legacy --policy OUTPUT DROP
2024-05-07T13:00:10+02:00 DEBUG [firewall] iptables-legacy --policy FORWARD DROP
2024-05-07T13:00:10+02:00 DEBUG [firewall] ip6tables-legacy --policy INPUT DROP
2024-05-07T13:00:10+02:00 DEBUG [firewall] ip6tables-legacy --policy OUTPUT DROP
2024-05-07T13:00:10+02:00 DEBUG [firewall] ip6tables-legacy --policy FORWARD DROP
2024-05-07T13:00:10+02:00 DEBUG [firewall] iptables-legacy --append INPUT -i lo -j ACCEPT
2024-05-07T13:00:10+02:00 DEBUG [firewall] ip6tables-legacy --append INPUT -i lo -j ACCEPT
2024-05-07T13:00:10+02:00 DEBUG [firewall] iptables-legacy --append OUTPUT -o lo -j ACCEPT
2024-05-07T13:00:10+02:00 DEBUG [firewall] ip6tables-legacy --append OUTPUT -o lo -j ACCEPT
2024-05-07T13:00:10+02:00 DEBUG [firewall] iptables-legacy --append OUTPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
2024-05-07T13:00:10+02:00 DEBUG [firewall] ip6tables-legacy --append OUTPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
2024-05-07T13:00:10+02:00 DEBUG [firewall] iptables-legacy --append INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
2024-05-07T13:00:10+02:00 DEBUG [firewall] ip6tables-legacy --append INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
2024-05-07T13:00:10+02:00 DEBUG [firewall] iptables-legacy --append OUTPUT -o eth0 -s 10.0.5.15 -d 10.0.5.0/24 -j ACCEPT
2024-05-07T13:00:10+02:00 DEBUG [firewall] ip6tables-legacy --append OUTPUT -o eth0 -d ff02::1:ff/104 -j ACCEPT
2024-05-07T13:00:10+02:00 DEBUG [firewall] iptables-legacy --append INPUT -i eth0 -d 10.0.5.0/24 -j ACCEPT
2024-05-07T13:00:10+02:00 INFO [firewall] enabled successfully
2024-05-07T13:00:11+02:00 INFO [storage] merging by most recent 19425 hardcoded servers and 19471 servers read from /gluetun/servers.json
2024-05-07T13:00:11+02:00 INFO [storage] Using nordvpn servers from file which are 44 days more recent
2024-05-07T13:00:11+02:00 INFO [storage] Using windscribe servers from file which are 124 days more recent
2024-05-07T13:00:11+02:00 DEBUG [netlink] IPv6 is not supported after searching 2 routes
2024-05-07T13:00:11+02:00 INFO Alpine version: 3.19.1
2024-05-07T13:00:12+02:00 INFO OpenVPN 2.5 version: 2.5.8
2024-05-07T13:00:12+02:00 INFO OpenVPN 2.6 version: 2.6.8
2024-05-07T13:00:13+02:00 INFO Unbound version: 1.19.3
2024-05-07T13:00:13+02:00 INFO IPtables version: v1.8.10
2024-05-07T13:00:13+02:00 INFO Settings summary:
├── VPN settings:
|   ├── VPN provider settings:
|   |   ├── Name: windscribe
|   |   └── Server selection settings:
|   |       ├── VPN type: wireguard
|   |       ├── Regions: Switzerland
|   |       └── Wireguard selection settings:
|   └── Wireguard settings:
|       ├── Private key: OMJ...H0=
|       ├── Pre-shared key: kVB...u4=
|       ├── Interface addresses:
|       |   └── 100.109.214.246/32
|       ├── Allowed IPs:
|       |   ├── 0.0.0.0/0
|       |   └── ::/0
|       └── Network interface: tun0
|           └── MTU: 1400
├── DNS settings:
|   ├── Keep existing nameserver(s): no
|   ├── DNS server address to use: 127.0.0.1
|   └── DNS over TLS settings:
|       ├── Enabled: yes
|       ├── Update period: every 24h0m0s
|       ├── Unbound settings:
|       |   ├── Authoritative servers:
|       |   |   └── cloudflare
|       |   ├── Caching: yes
|       |   ├── IPv6: no
|       |   ├── Verbosity level: 1
|       |   ├── Verbosity details level: 0
|       |   ├── Validation log level: 0
|       |   ├── System user: root
|       |   └── Allowed networks:
|       |       ├── 0.0.0.0/0
|       |       └── ::/0
|       └── DNS filtering settings:
|           ├── Block malicious: yes
|           ├── Block ads: no
|           ├── Block surveillance: no
|           └── Blocked IP networks:
|               ├── 127.0.0.1/8
|               ├── 10.0.0.0/8
|               ├── 172.16.0.0/12
|               ├── 192.168.0.0/16
|               ├── 169.254.0.0/16
|               ├── ::1/128
|               ├── fc00::/7
|               ├── fe80::/10
|               ├── ::ffff:127.0.0.1/104
|               ├── ::ffff:10.0.0.0/104
|               ├── ::ffff:169.254.0.0/112
|               ├── ::ffff:172.16.0.0/108
|               └── ::ffff:192.168.0.0/112
├── Firewall settings:
|   ├── Enabled: yes
|   └── Outbound subnets:
|       ├── 172.20.0.0/16
|       ├── 10.0.1.0/24
|       └── 10.0.5.0/24
├── Log settings:
|   └── Log level: debug
├── Health settings:
|   ├── Server listening address: 127.0.0.1:9999
|   ├── Target address: cloudflare.com:443
|   ├── Duration to wait after success: 5s
|   ├── Read header timeout: 100ms
|   ├── Read timeout: 500ms
|   └── VPN wait durations:
|       ├── Initial duration: 6s
|       └── Additional duration: 5s
├── Shadowsocks server settings:
|   └── Enabled: no
├── HTTP proxy settings:
|   └── Enabled: no
├── Control server settings:
|   ├── Listening address: :8000
|   └── Logging: yes
├── OS Alpine settings:
|   ├── Process UID: 1032
|   ├── Process GID: 65537
|   └── Timezone: Europe/Rome
├── Public IP settings:
|   ├── Fetching: every 12h0m0s
|   ├── IP file path: /tmp/gluetun/ip
|   └── Public IP data API: ipinfo
└── Version settings:
    └── Enabled: yes
2024-05-07T13:00:13+02:00 INFO [routing] default route found: interface eth0, gateway 10.0.5.1, assigned IP 10.0.5.15 and family v4
2024-05-07T13:00:13+02:00 DEBUG [routing] ip rule add from 10.0.5.15/32 lookup 200 pref 100
2024-05-07T13:00:13+02:00 INFO [routing] adding route for 0.0.0.0/0
2024-05-07T13:00:13+02:00 DEBUG [routing] ip route replace 0.0.0.0/0 via 10.0.5.1 dev eth0 table 200
2024-05-07T13:00:13+02:00 INFO [firewall] setting allowed subnets...
2024-05-07T13:00:13+02:00 DEBUG [firewall] iptables-legacy --append OUTPUT -o eth0 -s 10.0.5.15 -d 172.20.0.0/16 -j ACCEPT
2024-05-07T13:00:13+02:00 DEBUG [firewall] iptables-legacy --append OUTPUT -o eth0 -s 10.0.5.15 -d 10.0.1.0/24 -j ACCEPT
2024-05-07T13:00:13+02:00 DEBUG [firewall] iptables-legacy --append OUTPUT -o eth0 -s 10.0.5.15 -d 10.0.5.0/24 -j ACCEPT
2024-05-07T13:00:13+02:00 INFO [routing] default route found: interface eth0, gateway 10.0.5.1, assigned IP 10.0.5.15 and family v4
2024-05-07T13:00:13+02:00 INFO [routing] adding route for 172.20.0.0/16
2024-05-07T13:00:13+02:00 DEBUG [routing] ip route replace 172.20.0.0/16 via 10.0.5.1 dev eth0 table 199
2024-05-07T13:00:13+02:00 DEBUG [routing] ip rule add to 172.20.0.0/16 lookup 199 pref 99
2024-05-07T13:00:13+02:00 INFO [routing] adding route for 10.0.1.0/24
2024-05-07T13:00:13+02:00 DEBUG [routing] ip route replace 10.0.1.0/24 via 10.0.5.1 dev eth0 table 199
2024-05-07T13:00:13+02:00 DEBUG [routing] ip rule add to 10.0.1.0/24 lookup 199 pref 99
2024-05-07T13:00:13+02:00 INFO [routing] adding route for 10.0.5.0/24
2024-05-07T13:00:13+02:00 DEBUG [routing] ip route replace 10.0.5.0/24 via 10.0.5.1 dev eth0 table 199
2024-05-07T13:00:13+02:00 DEBUG [routing] ip rule add to 10.0.5.0/24 lookup 199 pref 99
2024-05-07T13:00:13+02:00 DEBUG [routing] ip rule add to 10.0.5.0/24 lookup 254 pref 98
2024-05-07T13:00:13+02:00 INFO [dns] using plaintext DNS at address 1.1.1.1
2024-05-07T13:00:13+02:00 INFO [http server] http server listening on [::]:8000
2024-05-07T13:00:13+02:00 INFO [healthcheck] listening on 127.0.0.1:9999
2024-05-07T13:00:13+02:00 DEBUG [wireguard] Wireguard server public key: G7LkwWk08Ase/Wi9mnOW77brNBC0vTCemvy1IW1nlV4=
2024-05-07T13:00:13+02:00 DEBUG [wireguard] Wireguard client private key: OMJ...H0=
2024-05-07T13:00:13+02:00 DEBUG [wireguard] Wireguard pre-shared key: kVB...u4=
2024-05-07T13:00:13+02:00 INFO [firewall] allowing VPN connection...
2024-05-07T13:00:13+02:00 DEBUG [firewall] iptables-legacy --append OUTPUT -d 169.150.197.163 -o eth0 -p udp -m udp --dport 1194 -j ACCEPT
2024-05-07T13:00:13+02:00 DEBUG [firewall] iptables-legacy --append OUTPUT -o tun0 -j ACCEPT
2024-05-07T13:00:13+02:00 DEBUG [firewall] ip6tables-legacy --append OUTPUT -o tun0 -j ACCEPT
2024-05-07T13:00:13+02:00 INFO [wireguard] Using available kernelspace implementation
2024-05-07T13:00:13+02:00 INFO [wireguard] Connecting to 169.150.197.163:1194
2024-05-07T13:00:13+02:00 INFO [wireguard] Wireguard setup is complete. Note Wireguard is a silent protocol and it may or may not work, without giving any error message. Typically i/o timeout errors indicate the Wireguard connection is not working.
2024-05-07T13:00:13+02:00 INFO [dns] downloading DNS over TLS cryptographic files
2024-05-07T13:00:14+02:00 INFO [healthcheck] healthy!
2024-05-07T13:00:15+02:00 INFO [dns] downloading hostnames and IP block lists
2024-05-07T13:00:22+02:00 DEBUG [healthcheck] unhealthy: dialing: dial tcp4: lookup cloudflare.com: i/o timeout
2024-05-07T13:00:22+02:00 INFO [dns] init module 0: validator
2024-05-07T13:00:22+02:00 INFO [dns] init module 1: iterator
2024-05-07T13:00:22+02:00 INFO [dns] start of service (unbound 1.19.3).
2024-05-07T13:00:22+02:00 INFO [dns] generate keytag query _ta-4a5c-4f66. NULL IN
2024-05-07T13:00:22+02:00 INFO [dns] ready
2024-05-07T13:00:22+02:00 INFO [vpn] You are running on the bleeding edge of latest!
2024-05-07T13:00:23+02:00 INFO [ip getter] Public IP address is 169.150.197.169 (Switzerland, Zurich, Zürich)
2024-05-07T13:00:23+02:00 INFO [healthcheck] healthy!

Share your configuration

gluetun:
    image: qmcgaw/gluetun:latest
    container_name: gluetun
    cap_add:
      - NET_ADMIN
    networks:  
      - media_bridge
    ports:
      - 8888:8888/tcp # HTTP proxy
      - 8388:8388/tcp # Shadowsocks
      - 8388:8388/udp # Shadowsocks
      - 8080:8080 # port for qbittorrent
    devices:
      - /dev/net/tun:/dev/net/tun
    volumes:
      - /volume1/configs/gluetun:/gluetun
    environment:
      - VPN_SERVICE_PROVIDER=windscribe
      - VPN_TYPE=wireguard
      - WIREGUARD_PRIVATE_KEY=[KEY]
      - WIREGUARD_ADDRESSES=[ADDRESS]
      - WIREGUARD_PRESHARED_KEY=[KEY]
      - SERVER_REGIONS=Switzerland
      - TZ=Europe/Rome
      - PUID=1032
      - PGID=65537
      - FIREWALL_OUTBOUND_SUBNETS=172.20.0.0/16,10.0.1.0/24,10.0.5.0/24
      - LOG_LEVEL=debug
    sysctls:
      - net.ipv6.conf.all.disable_ipv6=1
    restart: unless-stopped

github-actions[bot] commented 6 months ago

@qdm12 is more or less the only maintainer of this project and works on it in his free time. Please:

do not ask for updates, be patient
:+1: the issue to show your support instead of commenting @qdm12 usually checks issues at least once a week, if this is a new urgent bug, revert to an older tagged container image

qdm12 commented 6 months ago

TLDR: pull the latest image and check if it reports as unhealthy still?

The Gluetun Docker healthcheck queries the long running Gluetun instance for its health status every 5 seconds. The dial timeout was previously fixed at 3 seconds, so during high bandwidth usage, it's possible the dialing fails within 3 seconds, and so the health is reported as unhealthy, see debug log line:

DEBUG [healthcheck] unhealthy: dialing: dial tcp4: lookup cloudflare.com: i/o timeout

This is logged at the debug level, since it's not really a big deal and can happen from time to time. It doesn't affect the VPN internal healing either.

Now since 6042a9e3c24d21ef900d15a880c382884f342613, the tcp dialing timeouts change, increasing with every failure, from 2 seconds, to 4, 6, 8, and 10 seconds, instead of 3s always, which should allow it to work out during bandwidth stress for example with a 10 seconds timeout. I pushed another commit 1e25372189c59feea6448681bd17254e8029fc78 to increase the retries from 1 to 3. That means it would only report the container as unhealthy after it failed 4 times (4 x 5 seconds = 20 seconds), which means the internal healthcheck can try with the timeouts of 2s, 4s, 6s and 8s (20s total), which should fix this issue I hope. If not let me know, and I can change the internal timeouts from 2/4/6/8/10 to 2/4/10 for example, but I don't think it's necessary 🤔

giorgiooriani commented 6 months ago

Tested, container still flags as unhealthy even though the torrent downloaded fine in the background.

doodlebro commented 6 months ago

Anyone have a more reliable alternative to gluetun?

aidan-gibson commented 3 months ago

Same issue

qdm12 / gluetun