haugene / docker-transmission-openvpn

Docker container running Transmission torrent client with WebUI over an OpenVPN tunnel
GNU General Public License v3.0
4.04k stars 1.21k forks source link

Container Unhealthy, Loses Connection Frequently #1435

Closed dannyvfilms closed 3 years ago

dannyvfilms commented 3 years ago

I've had this for a while and I can't seem to get this container to reboot after it loses its connection. Finally caught it happening and captured a log. Notably, I do have the environment variable I've seen mentioned before enabled in Portainer, along with Restart Always. Unsure what else to do from here.

Log: https://pastebin.com/wspvcnsr

Environment: OPENVPN_OPTS = --inactive 3600 --ping 10 --ping-exit 60

DonkeyHotNew commented 3 years ago

Have you look on my thread #1389? After I put DROP_DEFAULT_ROUTE = false, I didn't see it goes unhealthy or any other issues.

haugene commented 3 years ago

Or check this comment: https://github.com/haugene/docker-transmission-openvpn/issues/684#issuecomment-454702880 Interested to see the startup logs if windscribe sends you options that overrides your ping-exit.

We're rebuilding the whole parsing and fetching of configs. Will try to add a script that sets some of these filters and other best practices by default so we don't have to do it per provider.

YujiShen commented 3 years ago

Also having this issue, DROP_DEFAULT_ROUTE = false not work. It becomes more and more frequent after I upgrade to 3.0, no such issue for a long time. From once a day to multiple time a day, and now it becomes multiple times an hour... Not sure whether it is a problem of Docker 2.5.0. I revert back to 2.1.4 Ubuntu also get unhealthy very quickly...

Here is the log after it become unhealthy:

Tue Nov  3 06:36:13 2020 SIGUSR1[hard,] received, process restarting
Tue Nov  3 06:36:18 2020 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Tue Nov  3 06:36:23 2020 RESOLVE: Cannot resolve host address: ca-montreal.privacy.network:1198 (Try again)
Tue Nov  3 06:36:28 2020 RESOLVE: Cannot resolve host address: ca-montreal.privacy.network:1198 (Try again)
Tue Nov  3 06:36:28 2020 Could not determine IPv4/IPv6 protocol
Tue Nov  3 06:36:28 2020 SIGUSR1[soft,init_instance] received, process restarting
Tue Nov  3 06:36:33 2020 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Tue Nov  3 06:36:38 2020 RESOLVE: Cannot resolve host address: ca-montreal.privacy.network:1198 (Try again)
Tue Nov  3 06:36:43 2020 RESOLVE: Cannot resolve host address: ca-montreal.privacy.network:1198 (Try again)
Tue Nov  3 06:36:43 2020 Could not determine IPv4/IPv6 protocol
Tue Nov  3 06:36:43 2020 SIGUSR1[soft,init_instance] received, process restarting
Tue Nov  3 06:36:48 2020 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Tue Nov  3 06:36:53 2020 RESOLVE: Cannot resolve host address: ca-montreal.privacy.network:1198 (Try again)
Tue Nov  3 06:36:58 2020 RESOLVE: Cannot resolve host address: ca-montreal.privacy.network:1198 (Try again)
Tue Nov  3 06:36:58 2020 Could not determine IPv4/IPv6 protocol
Tue Nov  3 06:36:58 2020 SIGUSR1[soft,init_instance] received, process restarting
Tue Nov  3 06:37:03 2020 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Tue Nov  3 06:37:08 2020 RESOLVE: Cannot resolve host address: ca-montreal.privacy.network:1198 (Try again)
Tue Nov  3 06:37:13 2020 RESOLVE: Cannot resolve host address: ca-montreal.privacy.network:1198 (Try again)
...

My config:

version: "3.3"
services:
  transmission-openvpn:
    container_name: transmission-openvpn
    image: haugene/transmission-openvpn:3.0.1
    cap_add:
      - NET_ADMIN
    restart: always
    dns:
      - 8.8.8.8
      - 8.8.4.4
    environment:
      - CREATE_TUN_DEVICE=true
      - OPENVPN_PROVIDER=PIA
      - OPENVPN_CONFIG=CA Montreal
      - OPENVPN_OPTS=--inactive 3600 --ping 10 --ping-exit 60
      - TRANSMISSION_UMASK=2
      - TRANSMISSION_CACHE_SIZE_MB=256
      - TRANSMISSION_RENAME_PARTIAL_FILES=true
      - TRANSMISSION_DOWNLOAD_QUEUE_ENABLED=false
      - TRANSMISSION_DOWNLOAD_QUEUE_SIZE=10
      - TRANSMISSION_RATIO_LIMIT_ENABLED=false
      - TRANSMISSION_UTP_ENABLED=true
      - TRANSMISSION_LPD_ENABLED=true
      - TRANSMISSION_INCOMPLETE_DIR_ENABLED=false
      - TRANSMISSION_START_ADDED_TORRENTS=false
      - TRANSMISSION_PREALLOCATION=1
      - TRANSMISSION_MAX_PEERS_GLOBAL=200000
      - TRANSMISSION_PEER_LIMIT_GLOBAL=200000
      - TRANSMISSION_PEER_LIMIT_PER_TORRENT=100
    logging:
      driver: json-file
      options:
        max-size: 10m

Here is the ip route:

0.0.0.0/1 via 10.57.112.1 dev tun0
default via 172.28.0.1 dev eth0
10.57.112.0/24 dev tun0 proto kernel scope link src 10.57.112.36
128.0.0.0/1 via 10.57.112.1 dev tun0
172.28.0.0/16 dev eth0 proto kernel scope link src 172.28.0.2
199.36.223.251 via 172.28.0.1 dev eth0
YujiShen commented 3 years ago

After getting unhealthy for about only 10 minutes on a restart of 3.0.1, I revert back to 2.13 and the container looks stable now. I remembered I have used 2.13 for at least last two months and it is pretty stable. Not sure whether it is the change in 2.14 caused the problem.

This user also used 2.13 as a work around: https://github.com/haugene/docker-transmission-openvpn/issues/1450#issuecomment-721052382

haugene commented 3 years ago

With all those errors on

RESOLVE: Cannot resolve host address: ca-montreal.privacy.network:1198 (Try again)

I would try to add these:

OVERRIDE_DNS_1=8.8.8.8
OVERRIDE_DNS_2=8.8.4.4
YujiShen commented 3 years ago

@haugene Thanks! Will try it. But isn't this the same asdns part in my config?

haugene commented 3 years ago

I'm doing a bit of parallel posting here :sweat_smile: Read my comment here https://github.com/haugene/docker-transmission-openvpn/issues/1322#issuecomment-721415228 Long story short: I hope so. It is definitely a more "brute force" approach.

YujiShen commented 3 years ago

@haugene Thanks for context! But container still become unhealthy after several hours. Logs are here: event_wait is new this time

the port has been bound to 46433  Tue Nov  3 23:54:44 UTC 2020
the port has been bound to 46433  Wed Nov  4 00:24:43 UTC 2020
the port has been bound to 46433  Wed Nov  4 00:54:42 UTC 2020
the port has been bound to 46433  Wed Nov  4 01:24:41 UTC 2020
the port has been bound to 46433  Wed Nov  4 01:54:40 UTC 2020
the port has been bound to 46433  Wed Nov  4 02:24:39 UTC 2020
the port has been bound to 46433  Wed Nov  4 02:54:38 UTC 2020
the port has been bound to 46433  Wed Nov  4 03:24:38 UTC 2020
the port has been bound to 46433  Wed Nov  4 03:54:37 UTC 2020
Wed Nov  4 03:59:25 2020 event_wait : Interrupted system call (code=4)
Wed Nov  4 03:59:25 2020 SIGUSR1[hard,] received, process restarting
Wed Nov  4 03:59:30 2020 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Wed Nov  4 03:59:35 2020 RESOLVE: Cannot resolve host address: ca-montreal.privacy.network:1198 (Try again)
Wed Nov  4 03:59:40 2020 RESOLVE: Cannot resolve host address: ca-montreal.privacy.network:1198 (Try again)
Wed Nov  4 03:59:40 2020 Could not determine IPv4/IPv6 protocol
Wed Nov  4 03:59:40 2020 SIGUSR1[soft,init_instance] received, process restarting
Wed Nov  4 03:59:45 2020 NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
Wed Nov  4 03:59:50 2020 RESOLVE: Cannot resolve host address: ca-montreal.privacy.network:1198 (Try again)
Wed Nov  4 03:59:55 2020 RESOLVE: Cannot resolve host address: ca-montreal.privacy.network:1198 (Try again)
Wed Nov  4 03:59:55 2020 Could not determine IPv4/IPv6 protocol
Wed Nov  4 03:59:55 2020 SIGUSR1[soft,init_instance] received, process restarting
...

2.13 seems most stable version for me. I tried 2.14 second time but it also became unhealthy later. Digging a little deeper I found the only relevant change in https://github.com/haugene/docker-transmission-openvpn/compare/2.13...2.14 is scripts/healthcheck.sh. It will send SIGUSR1 when network is down, same as SIGUSR1 in above log. So I am not sure whether this script caused the problem.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] commented 3 years ago

Feel free to re-open this issue if you think it deserves another look.