Closed frepke closed 6 months ago
This can happen occasionally see https://github.com/qdm12/gluetun/wiki/Healthcheck#internal-healthcheck
we tcp dial cloudflare.com:443 and sometimes this can fail and that's fine.
Does this happen every time or is it a one off issue?
It does happen every time the last 10 times I've checked, I don't create an issue for an one off issue :)
Does it happen consistently on latest but not v3.29.0?
I checked it a few times in v3.29.0, here the error also exists
But for now, the latest-logs look a bit different. Healthy and Unhealthy still exists, but there's no VPN stopping/starting anymore
It's most likely because the nameserver is changed to 127.0.0.1
to use Unbound, but Unbound didn't finish setting up (hence the connection refuse on port 53).
I'm working on #137 now, let's see if it indirectly solves it. I'll message here once it's done.
Seems I have the same issue here.
2022-06-19T20:26:36Z INFO [healthcheck] unhealthy: cannot dial: dial tcp4: i/o timeout
2022-06-19T20:26:37Z INFO [healthcheck] healthy!
2022-06-19T20:26:50Z INFO [healthcheck] unhealthy: cannot dial: dial tcp4: i/o timeout
2022-06-19T20:26:55Z INFO [healthcheck] healthy!
2022-06-19T20:27:03Z INFO [healthcheck] unhealthy: cannot dial: dial tcp4: i/o timeout
2022-06-19T20:27:08Z INFO [healthcheck] healthy!
2022-06-19T20:27:32Z INFO [healthcheck] unhealthy: cannot dial: dial tcp4: i/o timeout
2022-06-19T20:27:33Z INFO [healthcheck] healthy!
2022-06-19T20:27:46Z INFO [healthcheck] unhealthy: cannot dial: dial tcp4: i/o timeout
2022-06-19T20:27:47Z INFO [healthcheck] healthy!
2022-06-19T20:27:55Z INFO [healthcheck] unhealthy: cannot dial: dial tcp4: i/o timeout
2022-06-19T20:27:56Z INFO [healthcheck] healthy!
2022-06-19T20:28:09Z INFO [healthcheck] unhealthy: cannot dial: dial tcp4: i/o timeout
2022-06-19T20:28:10Z INFO [healthcheck] healthy!
2022-06-19T20:28:48Z INFO [healthcheck] unhealthy: cannot dial: dial tcp4: i/o timeout
2022-06-19T20:28:49Z INFO [healthcheck] healthy!
2022-06-19T20:29:07Z INFO [healthcheck] unhealthy: cannot dial: dial tcp4: i/o timeout
2022-06-19T20:29:15Z INFO [healthcheck] program has been unhealthy for 6s: restarting VPN
2022-06-19T20:29:15Z INFO [vpn] stopping
2022-06-19T20:29:15Z INFO [vpn] starting
2022-06-19T20:29:15Z INFO [firewall] allowing VPN connection...
2022-06-19T20:29:15Z INFO [wireguard] Using available kernelspace implementation
2022-06-19T20:29:15Z INFO [wireguard] Connecting to 62.210.204.161:51820
2022-06-19T20:29:15Z INFO [wireguard] Wireguard is up
2022-06-19T20:29:15Z INFO [healthcheck] healthy!
2022-06-19T20:29:15Z INFO [ip getter] Public IP address is 62.210.204.161 (France, Γle-de-France, Paris)
2022-06-19T20:29:49Z INFO [healthcheck] unhealthy: cannot dial: dial tcp4: i/o timeout
2022-06-19T20:29:54Z INFO [healthcheck] healthy!
Hi all, for me, changing HEALTH_TARGET_ADDRESS and DNS_ADDRESS to 1.1.1.1 both solved the issue.
@antro31 that's just a workaround, and it means you don't test if the DNS server is working or not.
Can one of you try using BLOCK_MALICIOUS=off
, does it give the same consistent unhealthy? For my part, running Mullvad with OpenVPN or Wireguard works fine and it's not unhealthy at start:
2022-06-27T21:51:48Z INFO [openvpn] Initialization Sequence Completed
2022-06-27T21:51:48Z INFO [dns over tls] downloading DNS over TLS cryptographic files
2022-06-27T21:51:48Z INFO [healthcheck] healthy!
2022-06-27T21:51:49Z INFO [dns over tls] downloading hostnames and IP block lists
2022-06-27T21:51:51Z INFO [dns over tls] init module 0: validator
2022-06-27T21:51:51Z INFO [dns over tls] init module 1: iterator
2022-06-27T21:51:51Z INFO [dns over tls] start of service (unbound 1.15.0).
2022-06-27T21:51:51Z INFO [dns over tls] generate keytag query _ta-4a5c-4f66. NULL IN
2022-06-27T21:51:51Z INFO [dns over tls] generate keytag query _ta-4a5c-4f66. NULL IN
2022-06-27T21:51:51Z INFO [dns over tls] ready
2022-06-27T21:51:51Z INFO [ip getter] Public IP address is 198.54.132.55 (United States, Illinois, Chicago)
2022-06-27T21:51:52Z INFO [vpn] There is a new release v3.29.0 (v3.29.0) created 46 days ago
I checked it a few times with MALICIOUS OFF
, now the VPN restarts are gone
@qdm12 I've been having similar issues recently. Gluetun starting / stopping the VPN due an unhealthy ping. The issue is that the dependent containers seem to lose all connectivity until they are themselves restarted. That wasn't an issue before the healtcheck mechanism was introduced.
Maybe the VPN could be restarted only in multiple checks fail over the course of a minute or so ? I don't really know if a gluetun could signal other containers to automatically restart if it has to kick and restart the VPN.
For now i have turned off BLOCK_MALICIOUS, as well as SURVEILLANCE and ADS which i had turned on, let's see if that addresses the issue at least temporarily.
Thanks for the tool though, it's great and very useful !
@romainguinot you can make durations larger https://github.com/qdm12/gluetun/wiki/Health-options
The issue is that the dependent containers seem to lose all connectivity until they are themselves restarted.
Actually the point of the 'inner vpn restart' is so connected containers don't disconnect. Are you sure there isn't something retarting gluetun externally (as in, container restart)? That would cause connected containers to disconnect.
I don't really know if a gluetun could signal other containers to automatically restart if it has to kick and restart the VPN.
Subscribe to #641 its still a work in progress (through another container qmcgaw/deunhealth) and I'm lacking time, but I'm doing my best to finish this soon.
Actually the point of the 'inner vpn restart' is so connected containers don't disconnect. Are you sure there isn't something retarting gluetun externally (as in, container restart)? That would cause connected containers to disconnect.
As far as i can tell no, gluetun does not restart. But if there is an inner VPN restart, some containers are fine with it, some are not. I suspect that those that have long running connections may get "confused" by the VPN restart and lose connectivity, but those who only need periodic web access in short bursts aren't affected.
I have turned off for now BLOCK_MALICIOUS, as well as SURVEILLANCE and ADS and will see how it goes. To mitigate this a bit, i have also scheduled a daily restart of the affected container that gets stuck if the inner VPN is restarted.
I don't really know if a gluetun could signal other containers to automatically restart if it has to kick and restart the VPN.
Subscribe to #641 its still a work in progress (through another container qmcgaw/deunhealth) and I'm lacking time, but I'm doing my best to finish this soon.
I will subscribe. Take your time though it's not a huge deal. Gluetun is really great and it's really appreciated how quick and detailed your responses are.
Seems I have the same issue here. I have turned off for now BLOCK_MALICIOUS as you suggested and will see how it goes.
Good job with your gluetun project!
+1 for users experiencing this issue.
Mine wasn't as repetitive as the examples above but it would happen more often than not when setting the container up. Seemed to be more stable with some Surfshark endpoint than others. E.g. Hardly ever occurred connecting to hostname sg-hk.prod.surfshark.com but often for nl-sg.prod.surfshark.com. When it did happen I would also never see the [ip getter]
with the public ip in the logs. Made me nervous so I kept restarting until it was present.
Setting BLOCK_MALICIOUS=no
and this error still occurs but significantly less frequently.
Iβm also having frequent healthcheck failures and gluetun disconnection, screwing the container behind since few days now. What does the block malicious option do please? Thank you!
<removed by qdm12>
@romainguinot You are correct, long running connections might fail. I had the case within Gluetun and the http client communicating with the Private Internet Access API. The solution for me was to close the idle connections of my http client, but that's really a programming detail and not always possible to do for other containers. Once #641 is done, this should fix that problem though (restart all connected containers).
For other people complaining about frequent internal vpn restarts:
BLOCK_MALICIOUS
has zero effect on the healthcheck, unless you have a DNS error on port 53, which isn't the caseforgot to reply @qdm12 sorry. For now with the scheduled daily restarts of the affected container it seems to mitigate the issue. One day if there can be a restart of dependent containers that would be great but no rush.
I wish in the Synology NAS or in Portainer you could easily mark containers as dependent on gluetun so that they can wait for a healthy gluetun as well before starting up, but that's a minor inconvenience as this is only an issue when the whole NAS is restarted which is clearly not very frequent.
@romainguinot
I use:
depends_on:
- gluetun
after all of the dependent containers in my Portainer Stack -- and it seems to do the trick. The only time I need to stop and restart the entire stack is when I do an on demand update of all running containers using Watchtower.
See #2154 there is some interesting information, especially
tldr: For me, UDP-based VPNs (both Wireguard and OpenVPN) experiences this issue, but TCP-based OpenVPN works without connection restarts.
Closing this due to inactivity π
Closed issues are NOT monitored, so commenting here is likely to be not seen. If you think this is still unresolved and have more information to bring, please create another issue.
This is an automated comment setup because @qdm12 is the sole maintainer of this project which became too popular to monitor issues closed.
Is this urgent?
No
Host OS
Debian Bullseye
CPU arch
x86_64
VPN service provider
Surfshark
What are you using to run the container
docker-compose
What is the version of Gluetun
Running version latest built on 2022-06-06T18:13:11.996Z (commit 5359257)
What's the problem π€
Sequence complete, Healthy, then Unhealthy, Restarting VPN, Sequence complete, and afterwards Healthy again
Share your logs
Share your configuration