Hard coding "eth0" causes an issue when using podman

wfg / docker-openvpn-client

OpenVPN client with killswitch and proxy servers; built on Alpine

MIT License

353 stars 107 forks source link

Hard coding "eth0" causes an issue when using podman #93

Open pablos-here opened 1 year ago

pablos-here commented 1 year ago

Hi,

As mentioned in #87, when the container is run via podman, the network device is tun0 rather than the hard coded eth0.

Perhaps adding smarts to entry.sh to determine the tap. Below is an example that I added to my local copy of entry.sh - see TAP=... For context, I am providing some lines surrounding where I believe it should go.

I also globally ^eth0^$TAP^

default_gateway=$(ip -4 route | grep 'default via' | awk '{print $3}')

TAP=$(ip --brief a | grep -v '^lo' | awk '{ print $1}')
case "$KILL_SWITCH" in

pablos-here commented 1 year ago

As pointed out elsewhere, the optimal solution is to accept --env TAP environment variable that can override eth0.

In place of eth0, one can use ${TAP:=eth0}, which will resolves to eth0 if TAP is not set.

wfg commented 1 year ago

I'm not too familiar with the subtle differences between Podman and Docker. I assume this is rootless Podman? I'd like to play with it myself.

pablos-here commented 1 year ago

I'm late to the entire container game. :) ... better late than never.

I started learning about podman last week. As they say, it's a (near) drop-in replacement for docker. I do like that it's rootless.

Below is a raw shell script that I'm using during test. You can see the similarities with docker. It'll be dead-easy for you.

My plan is to eventually get a PR for an .md with podman differences. We can talk about how you'd prefer to see that: add to the existing README.md or create a separate podman area.

Thx!

#!/bin/bash

ENV_FILE="$HOME/.VPN/container.env"

# Proxy port maps
LOCAL_HTTP_PROXY_PORT=8080
LOCAL_SOCKS_PROXY_PORT=1080

podman run --name VPN \
  --detach \
  --privileged \
  --name=openvpn-client \
  --env-file=$ENV_FILE \
  --env TAP="tap0" \
  --env RETRY=2 \
  --env MAX_RETRY=10 \
  --publish $LOCAL_HTTP_PROXY_PORT:8080 \
  --publish $LOCAL_SOCKS_PROXY_PORT:1080 \
  --cap-add=NET_ADMIN \
  --device=/dev/net/tun \
  --tz=local \
  --volume $HOME/.VPN:/data/vpn \
  --volume $HOME/.VPN:/run/secrets \
  ghcr.io/wfg/openvpn-client

cdeadlock commented 1 year ago

This is very dangerous actually: my result was complete removable of default gateway and lost networking.

I don't know if this was because of debian 11 defaults to using enp2s0 style device handles, or because my hardware happened to have 2 network ports, enp3s0 is the second one. But yes using hardcoded eth0 seems risky for many systems.

It was very confusing because the first error message is about ghcr.io "networking not available" which makes you think it never even pulled the image. But it did pull image, and since I was using the "subnets" i was still able to ssh into the unit for further troubleshooting.

At first I assumed the logical thing to do was install this and test exactly how it acts with no vpn credentials : and I thought the killswitch would only affect anything using the docker networking. This is not the case, the killswitch is in the HOST operating system so it also affects everything outside docker networking too.

The only reason I even suspected this was the killswitch was because i could still ping the indicated "subnet" but not 1.1.1.1. I could find no entries in nftables nor iptables. Eventually i saw that "ip route" showed no default. I could see from another on the same network what this gateway ip was and the fix is "ip route add default via 192.168.1.254" on my debian 11 system

I will investigate further if the "eth0" is the only problem going on here for this type of system

pablos-here commented 1 year ago

Hi @cdeadlock ,

I couldn't replicate your issue with the latest test bits. If you are using podman, then yes, the current bits will not work. You'll have to wait until I submit a PR and @wfg has time to review it.

If you are not using podman, then I would suggest that you open a new issue with steps to replicate the problem.

In my environment, I set up my VM with two NICs:

└─▬ $ ip --brief a
lo               UNKNOWN        127.0.0.1/8 ::1/128 
enp0s3           UP             10.0.2.15/24 fe80::a00:27ff:fe05:7b75/64 
enp0s8           UP             10.0.3.15/24 fe80::7679:52:d973:1b62/64

With the new log output, we can see that the container only has one NIC. Furthermore, you can see that the log says that I have remapped eth0 to tap0 (what podman presents).

--- Container interfaces ---
lo               UNKNOWN        127.0.0.1/8 ::1/128 
tap0             UNKNOWN        10.0.2.100/24 fd00::b8f0:c9ff:fe9a:ab76/64 fe80::b8f0:c9ff:fe9a:ab76/64 

Script 'eth0' remapped to 'tap0'

pablos-here commented 1 year ago

Here's the output when it is not remapped. As I'm using podman, notice the failure.

...
--- Container interfaces ---
lo               UNKNOWN        127.0.0.1/8 ::1/128 
tap0             UNKNOWN        10.0.2.100/24 fd00::e4ee:1ff:fe36:e322/64 fe80::e4ee:1ff:fe36:e322/64 

Script 'eth0' not remapped.

If the container is failing to start and 'eth0' is not listed above,
remap 'eth0' in the script to the above value (not 'lo')

For example, if the non-'lo' interface is 'tap0', remap
'eth0' to 'tap0' by using the '--env' switch:

   --env TAP=tap0

---
HTTP proxy: enabled
SOCKS proxy: enabled
Listening on: 0.0.0.0
---
info: original configuration file: vpn/XXXX.ovpn
info: modified configuration file: /tmp/openvpn.qBrQZq7F.conf
info: kill switch is using iptables
-----
Cannot find device "eth0"