Fatal error in down script: cp: can't create '/etc/resolv.conf': File exists

Lyncredible commented 3 years ago

Thank you for making this. It rocks!

My proxy instance went down yesterday and stayed unhealthy. See the full log at the bottom, and the direct culprit appears to be:

cp -f /etc/resolv.conf{.backup,}
cp: can't create '/etc/resolv.conf': File exists

However, the main question here is: would the openvpn client restart if the down script executed successfully? I expect the system should heal itself on interruptions like this.

2021-08-30 14:23:15 VERIFY OK: depth=2, C=VG, O=Surfshark, CN=Surfshark Root CA
2021-08-30 14:23:15 VERIFY OK: depth=1, C=VG, O=Surfshark, CN=Surfshark Intermediate CA
2021-08-30 14:23:15 VERIFY KU OK
2021-08-30 14:23:15 Validating certificate extended key usage
2021-08-30 14:23:15 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
2021-08-30 14:23:15 VERIFY EKU OK
2021-08-30 14:23:15 VERIFY OK: depth=0, CN=us-sea-v020.prod.surfshark.com
2021-08-30 14:23:15 WARNING: 'link-mtu' is used inconsistently, local='link-mtu 1635', remote='link-mtu 1583'
2021-08-30 14:23:15 WARNING: 'auth' is used inconsistently, local='auth SHA512', remote='auth [null-digest]'
2021-08-30 14:23:15 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
2021-08-30 14:23:15 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
2021-08-30 14:23:15 Control Channel: TLSv1.2, cipher TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384, 2048 bit RSA
2021-08-30 15:23:09 VERIFY OK: depth=2, C=VG, O=Surfshark, CN=Surfshark Root CA
2021-08-30 15:23:09 VERIFY OK: depth=1, C=VG, O=Surfshark, CN=Surfshark Intermediate CA
2021-08-30 15:23:09 VERIFY KU OK
2021-08-30 15:23:09 Validating certificate extended key usage
2021-08-30 15:23:09 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
2021-08-30 15:23:09 VERIFY EKU OK
2021-08-30 15:23:09 VERIFY OK: depth=0, CN=us-sea-v020.prod.surfshark.com
2021-08-30 15:23:09 WARNING: 'link-mtu' is used inconsistently, local='link-mtu 1635', remote='link-mtu 1583'
2021-08-30 15:23:09 WARNING: 'auth' is used inconsistently, local='auth SHA512', remote='auth [null-digest]'
2021-08-30 15:23:09 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
2021-08-30 15:23:09 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
2021-08-30 15:23:09 Control Channel: TLSv1.2, cipher TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384, 2048 bit RSA
2021-08-30 15:25:23 event_wait : Interrupted system call (code=4)
2021-08-30 15:25:23 /scripts/down tun0 1500 1586 10.7.7.2 255.255.255.0 init
echo "down"
/usr/bin/killall sockd
down
killall: can't kill pid 14513: No such process
killall: can't kill pid 14514: No such process
killall: can't kill pid 15616: No such process
killall: can't kill pid 35714: No such process
rm /up
cp -f /etc/resolv.conf{.backup,}
cp: can't create '/etc/resolv.conf': File exists
2021-08-30 15:25:23 WARNING: Failed running command (--up/--down): external program exited with error status: 1
2021-08-30 15:25:23 Exiting due to fatal error

# Wait until VPN says we're up
while [ ! -f /up ]; do
  sleep 1
done

wolph commented 3 years ago

That's really strange... it's using cp -f which means a forced copy so it should overwrite the existing file.

I should note that I switched to the Github hosting due to the recent dockerhub changes so the new url should be: ghcr.io/wolph/wollen-socks:master (the :master is currently needed)

Can you give that a try instead?

Lyncredible commented 3 years ago

Thanks. I tried the new url and it went down again after 24 hours:

2021-09-02 07:15:19 VERIFY OK: depth=2, C=VG, O=Surfshark, CN=Surfshark Root CA
2021-09-02 07:15:19 VERIFY OK: depth=1, C=VG, O=Surfshark, CN=Surfshark Intermediate CA
2021-09-02 07:15:19 VERIFY KU OK
2021-09-02 07:15:19 Validating certificate extended key usage
2021-09-02 07:15:19 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
2021-09-02 07:15:19 VERIFY EKU OK
2021-09-02 07:15:19 VERIFY OK: depth=0, CN=us-sea-v031.prod.surfshark.com
2021-09-02 07:15:19 WARNING: 'link-mtu' is used inconsistently, local='link-mtu 1635', remote='link-mtu 1583'
2021-09-02 07:15:19 WARNING: 'auth' is used inconsistently, local='auth SHA512', remote='auth [null-digest]'
2021-09-02 07:15:19 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
2021-09-02 07:15:19 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
2021-09-02 07:15:19 Control Channel: TLSv1.2, cipher TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384, peer certificate: 2048 bit RSA, signature: RSA-SHA256
2021-09-02 08:10:36 VERIFY OK: depth=2, C=VG, O=Surfshark, CN=Surfshark Root CA
2021-09-02 08:10:36 VERIFY OK: depth=1, C=VG, O=Surfshark, CN=Surfshark Intermediate CA
2021-09-02 08:10:36 VERIFY KU OK
2021-09-02 08:10:36 Validating certificate extended key usage
2021-09-02 08:10:36 ++ Certificate has EKU (str) TLS Web Server Authentication, expects TLS Web Server Authentication
2021-09-02 08:10:36 VERIFY EKU OK
2021-09-02 08:10:36 VERIFY OK: depth=0, CN=us-sea-v031.prod.surfshark.com
2021-09-02 08:10:36 WARNING: 'link-mtu' is used inconsistently, local='link-mtu 1635', remote='link-mtu 1583'
2021-09-02 08:10:36 WARNING: 'auth' is used inconsistently, local='auth SHA512', remote='auth [null-digest]'
2021-09-02 08:10:36 Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
2021-09-02 08:10:36 Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
2021-09-02 08:10:36 Control Channel: TLSv1.2, cipher TLSv1.2 ECDHE-RSA-AES256-GCM-SHA384, peer certificate: 2048 bit RSA, signature: RSA-SHA256
2021-09-02 08:40:17 event_wait : Interrupted system call (code=4)
2021-09-02 08:40:17 /scripts/down tun0 1500 1586 10.7.7.4 255.255.255.0 init
echo "down"
/usr/bin/killall sockd
down
killall: can't kill pid 13056: No such process
killall: can't kill pid 13288: No such process
rm /up
cp -f /etc/resolv.conf{.backup,}
cp: can't create '/etc/resolv.conf': File exists
2021-09-02 08:40:17 WARNING: Failed running command (--up/--down): external program exited with error status: 1
2021-09-02 08:40:17 Exiting due to fatal error

# Wait until VPN says we're up
while [ ! -f /up ]; do
  sleep 1
done

Lyncredible commented 3 years ago

It turns out there is a default timeout of 18 hours: https://github.com/WoLpH/wollen-socks/blob/bc1aa532e62ba0683f18f4e3c559ea81e35d97b0/Dockerfile#L5

Setting timeout to 0 causes the openvpn process to terminate right away. So I had to set timeout to a sufficiently large number of days.

Why is timeout necessary? Why not default to no timeout?

wolph commented 3 years ago

I've got the container running in kubernetes so I've set the timeout so it automatically restarts once in a while. I've noticed some instability after running for several days.

In any case, a timeout of 0 should keep it running forever so if it doesn't that's definitely a bug.

wolph commented 3 years ago

I've fixed the timeout=0 bug. The alpine timeout command just behaves differently from the regular timeout commands.

As for the cp -f thing... looks like that's another bug in alpine.

b43d4274c655# cp /etc/resolv.conf{.backup,}
cp: can't create '/etc/resolv.conf': File exists
b43d4274c655# cp -f /etc/resolv.conf{.backup,}
cp: can't create '/etc/resolv.conf': File exists
b43d4274c655# cp --help
BusyBox v1.31.1 () multi-call binary.

Usage: cp [OPTIONS] SOURCE... DEST

Copy SOURCE(s) to DEST

        -a      Same as -dpR
        -R,-r   Recurse
        -d,-P   Preserve symlinks (default if -R)
        -L      Follow all symlinks
        -H      Follow symlinks on command line
        -p      Preserve file attributes if possible
        -f      Overwrite
        -i      Prompt before overwrite
        -l,-s   Create (sym)links
        -T      Treat DEST as a normal file
        -u      Copy only newer files

Long story short... even though -f should overwrite, it does not.

wolph commented 3 years ago

I've fixed the issues so you should be good to go now :)

Lyncredible commented 3 years ago

Thanks for the quick fix! I can confirm that setting timeout to 0 works now.

wolph / wollen-socks

Fatal error in down script: cp: can't create '/etc/resolv.conf': File exists #3