Closed Rycieos closed 3 years ago
Hi @Rycieos, thanks for the extensive report!
Could you try a few things for me?
--network host
? And could you try with --privileged
instead of the --cap
flags?robbertkl/ipv6nat:0.4.2
instead of latest? I've recently upgraded go-iptables, so this could indeed be related to the issue you mentioned.I am running with --network host
, yes.
Trying --privileged
instead has the same result.
I forgot to mention, I was running v0.4.1 I think before debugging, and updated to see if it resolved my issue. I just tested versions v0.4.1, v0.4.2, and v0.4.3 with both --privileged
and --cap-add
s, same results.
Trying
--privileged
instead has the same result
Just to be sure: in that case you're running the ipv6nat container with --network host
AND --privileged
?
And before the system upgrade, it was all working fine?
v0.4.1 and v0.4.2 have been around for quite a while already, but haven't seen an issue like this before. And if the issue started only after the system upgrade, that makes it only stranger, and doesn't seem related to go-iptables. I've also upgraded to newer versions of Alpine and Go for the v0.4.3 container, but this would be unrelated as well, since you have the same issue with v0.4.1 and v0.4.2.
The iptables upgrade of your systems goes from 1.8.4-15.el8.x86_64 to 1.8.4-15.el8_3.3.x86_64, but can't find what the changes are between those. Also, you mentioned doing the check yourself works just fine..
A few more things to try:
Can you try to list the iptables rules (iptables-save -t nat
) and run the check commands (-C
) from within the docker-ipv6nat container image? Since you can't attach to the running container (it keeps crashing), you could start temporary container with something like docker run --rm -it --privileged --network host --entrypoint sh robbertkl/ipv6nat
and run the commands from there.
Can you try running the docker-ipv6nat binary directly on the host (so not in a docker container)?
Can you try to list the iptables rules (iptables-save -t nat) and run the check commands (-C) from within the docker-ipv6nat container image? Since you can't attach to the running container (it keeps crashing), you could start temporary container with something like docker run --rm -it --entrypoint sh robbertkl/ipv6nat and run the commands from there.
Good idea!
Sanity check from host:
$ sudo iptables-save -t nat
# Generated by iptables-save v1.8.4 on Sat Jan 9 16:06:51 2021
*nat
...
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
...
Container:
$ docker run --rm -it --privileged --network host --entrypoint sh robbertkl/ipv6nat
/ # iptables-save -t nat
# Generated by iptables-save v1.8.4 on Sat Jan 9 21:06:41 2021
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
COMMIT
# Completed on Sat Jan 9 21:06:41 2021
Well that isn't good. Same thing if I do docker run --rm -it --cap-add NET_ADMIN --cap-add NET_RAW --network host --entrypoint sh robbertkl/ipv6nat
instead.
Any ideas? I'll keep digging.
From within the container (docker run, again with --privileged --network host), could you run these 3 commands:
ls -l `which iptables`
xtables-legacy-multi iptables-save -t nat
xtables-nft-multi iptables-save -t nat
Sorry, forgot:
Before running the ls -l
command, please run ./docker-ipv6nat-compat
once.
$ docker run --rm -it --cap-add ALL --network host --entrypoint sh robbertkl/ipv6nat
/ # ls -l `which iptables`
lrwxrwxrwx 1 root root 20 Dec 28 22:49 /sbin/iptables -> xtables-legacy-multi
/ # xtables-legacy-multi iptables-save -t nat
# Generated by iptables-save v1.8.4 on Sat Jan 9 21:17:25 2021
*nat
:PREROUTING ACCEPT [837:70639]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [138:9289]
:POSTROUTING ACCEPT [942:73933]
COMMIT
# Completed on Sat Jan 9 21:17:25 2021
/ # xtables-nft-multi iptables-save -t nat
# Generated by iptables-save v1.8.4 on Sat Jan 9 21:17:29 2021
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/32 ! -o docker0 -j MASQUERADE
-A POSTROUTING -s 10.1.0.0/32 ! -o br-dce386402b8e -j MASQUERADE
...
-A POSTROUTING -s 10.1.0.2/32 -d 10.1.0.2/32 -p tcp -m tcp --dport 443 -j MASQUERADE
-A POSTROUTING -s 10.1.0.2/32 -d 10.1.0.2/32 -p tcp -m tcp --dport 80 -j MASQUERADE
...
-A OUTPUT ! -d 127.0.0.0/32 -m addrtype --dst-type LOCAL -j DOCKER
-A DOCKER -i docker0 -j RETURN
-A DOCKER -i br-dce386402b8e -j RETURN
...
-A DOCKER ! -i br-dce386402b8e -p tcp -m tcp --dport 443 -j DNAT --to-destination 10.1.0.2:443
-A DOCKER ! -i br-dce386402b8e -p tcp -m tcp --dport 80 -j DNAT --to-destination 10.1.0.2:80
...
COMMIT
# Completed on Sat Jan 9 21:17:29 2021
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them
That is identical to what I see on the host. Does that mean the iptables update I got switched from the legacy backend to a newer backend?
And before the system upgrade, it was all working fine?
Correct.
Oops, I missed this before: Running on the host:
$ sudo iptables-save -t nat
# Generated by iptables-save v1.8.4 on Sat Jan 9 16:15:14 2021
...
# Completed on Sat Jan 9 16:15:14 2021
# Warning: iptables-legacy tables present, use iptables-legacy-save to see them
$ sudo iptables-legacy-save -t nat
sudo: iptables-legacy-save: command not found
That happens with iptables -L
as well. I don't remember seeing that warning before, so something must have changed with my update.
Sorry, forgot:
Before running the
ls -l
command, please run./docker-ipv6nat-compat
once.
$ docker run --rm -it --cap-add NET_ADMIN --cap-add NET_RAW --network host --entrypoint sh robbertkl/ipv6nat
/ # ls -l `which iptables`
lrwxrwxrwx 1 root root 20 Dec 28 22:49 /sbin/iptables -> xtables-legacy-multi
/ # ./docker-ipv6nat-compat
2021/01/09 21:25:59 unable to detect hairpin mode (is the docker daemon running?)
/ # ls -l `which iptables`
lrwxrwxrwx 1 root root 17 Jan 9 21:25 /sbin/iptables -> xtables-nft-multi
Looks like that fixes it to point to the right one. I'm assuming the standard entry point must not be doing that, or it should be working for me.
It looks like your system has indeed switched backend, but docker-ipv6nat-compat
should pick that up and symlink to the correct version.
Could you try the previous ls -l command again, but this time after you run ./docker-ipv6nat-compat
once?
Also: do you normally run the ipv6nat container with the entrypoint set? Or does it use the default entrypoint?
Also: do you normally run the ipv6nat container with the entrypoint set? Or does it use the default entrypoint?
Default. Here is my full docker-compose file:
version: '2.3'
services:
ipv6nat:
image: robbertkl/ipv6nat:0.4.3
restart: always
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
network_mode: host
cap_drop:
- ALL
cap_add:
- NET_RAW
- NET_ADMIN
- SYS_MODULE
Hmm, so it does switch over to xtables-nft-multi (docker-ipv6nat-compat is actually the standard entrypoint), but still doesn't work? That's very strange.
1 thing that looks off in your output is that:
xtables-legacy-multi iptables-save -t nat
shows no rules, but it does have counters behind each chain (the numbers in brackets)xtables-nft-multi iptables-save -t nat
shows Docker's NAT rules, but the counters are all 0I think I got it!
On host:
$ sudo iptables-save -t nat
# Generated by iptables-save v1.8.4 on Sat Jan 9 16:34:29 2021
*nat
...
-A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
In container (I hacked the entrypoint to not exit after it crashes):
± docker exec -it ipv6nat_ipv6nat_1 sh
/ # ls -l `which iptables`
lrwxrwxrwx 1 root root 17 Jan 9 21:32 /sbin/iptables -> xtables-nft-multi
/ # iptables-save -t nat
# Generated by iptables-save v1.8.4 on Sat Jan 9 21:33:13 2021
*nat
...
-A OUTPUT ! -d 127.0.0.0/32 -m addrtype --dst-type LOCAL -j DOCKER
It shows the address as 127.0.0.0/32
, not 127.0.0.0/8
!
Wow, that should definitely explain it!
Any idea why that would happen and why only after a system upgrade? Very strange how iptables-save (both version 1.8.4) show different things on the host and in container for the same rule!
Actually, I should stop truncating output. It shows that for every single rule!
-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
turns into
-A POSTROUTING -s 172.17.0.0/32 ! -o docker0 -j MASQUERADE
Same thing is done on the filter
table as well.
Agreed, very strange.
I'll try digging into why it warns about legacy tables, maybe there is a RedHat bug somewhere about this that could provide some clues.
Not sure, but perhaps the legacy tables were created by the iptables commands within the container before running docker-ipv6nat-compat
(by default the image is set to legacy). I would recommend a reboot to see if the warning goes away.
The counters are still a bit strange, however. Is Docker's NAT working OK for IPv4 after the upgrade? Can you reach the published ports?
Not sure, but perhaps the legacy tables were created by the iptables commands within the container before running
docker-ipv6nat-compat
(by default the image is set to legacy). I would recommend a reboot to see if the warning goes away.
Yeah, I'll try a reboot.
The counters are still a bit strange, however. Is Docker's NAT working OK for IPv4 after the upgrade? Can you reach the published ports?
Yup, IPv4 works just fine. The counters look normal from the host, it's just the chain global policy counters that have 0s:
± sudo iptables -t nat -L -v
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
1567 92212 DOCKER all -- any any anywhere anywhere ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE all -- any !docker0 172.17.0.0/16 anywhere
10396 629K MASQUERADE all -- any !br-dce386402b8e 10.1.0.0/16 anywhere
...
0 0 MASQUERADE tcp -- any any 10.1.0.2 10.1.0.2 tcp dpt:https
0 0 MASQUERADE tcp -- any any 10.1.0.2 10.1.0.2 tcp dpt:http
...
Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
0 0 DOCKER all -- any any anywhere !127.0.0.0/8 ADDRTYPE match dst-type LOCAL
Chain DOCKER (2 references)
pkts bytes target prot opt in out source destination
0 0 RETURN all -- docker0 any anywhere anywhere
0 0 RETURN all -- br-dce386402b8e any anywhere anywhere
...
174 9100 DNAT tcp -- !br-dce386402b8e any anywhere anywhere tcp dpt:https to:10.1.0.2:443
14 648 DNAT tcp -- !br-dce386402b8e any anywhere anywhere tcp dpt:http to:10.1.0.2:80
...
# Warning: iptables-legacy tables present, use iptables-legacy to see them
The second reboot did clear up the warnings of legacy tables existing. But it did not fix the problem. Iptables in the container still shows /32
for all ip ranges.
The second reboot did clear up the warnings of legacy tables existing.
But only until exec
ed into the container and ran iptables-legacy-save
(symlinked as iptables-save
, which I forgot about), which created legacy tables! So that is where the legacy rules are coming from, and why I never saw the warning before. I can't seem to find a way to remove these rules without rebooting.
Yeah, that's what I thought created the legacy rules. But if you're exec'ing into the container, the container would have already executed the docker-ipv6nat-compat
script and symlinked iptables-save
to xtables-nft-multi
, right? Or do you mean a manually started container?
Or do you mean a manually started container?
Yeah, I just did that in a manually started container after rebooting without thinking.
I'm still stumped as to how a iptables patch update could cause this. Especially since the version that is showing wrong output is in the container and wasn't changed.
It could still be that the update switched the backend from legacy to nft and something is not working properly in the "translation" to iptables output, which iptables-nft does. Can't confirm on my current system, as it's using legacy; will have to set up a new machine to test it.
Could you try installing iptables 1.8.6 from Alpine "edge" within the container:
apk upgrade iptables ip6tables --no-cache --repository=http://dl-cdn.alpinelinux.org/alpine/edge/main
No dice:
± docker run --rm -it --cap-add NET_ADMIN --cap-add NET_RAW --network host --entrypoint sh robbertkl/ipv6nat
/ # apk upgrade iptables ip6tables --no-cache --repository=http://dl-cdn.alpinelinux.org/alpine/edge/main
fetch http://dl-cdn.alpinelinux.org/alpine/edge/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/community/x86_64/APKINDEX.tar.gz
Upgrading critical system libraries and apk-tools:
(1/2) Upgrading musl (1.1.24-r10 -> 1.2.2_pre7-r0)
(2/2) Upgrading apk-tools (2.10.5-r1 -> 2.12.0-r4)
Executing busybox-1.31.1-r19.trigger
Continuing the upgrade transaction with new apk-tools:
fetch http://dl-cdn.alpinelinux.org/alpine/edge/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/community/x86_64/APKINDEX.tar.gz
(1/2) Upgrading iptables (1.8.4-r2 -> 1.8.6-r0)
(2/2) Upgrading ip6tables (1.8.4-r2 -> 1.8.6-r0)
Executing busybox-1.31.1-r19.trigger
OK: 8 MiB in 18 packages
/ # ./docker-ipv6nat-compat
2021/01/09 22:55:45 unable to detect hairpin mode (is the docker daemon running?)
/ # iptables-save -V
iptables-save v1.8.6 (nf_tables)
/ # iptables-save -t nat
# Generated by iptables-save v1.8.6 on Sat Jan 9 22:55:53 2021
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:DOCKER - [0:0]
-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A POSTROUTING -s 172.17.0.0/32 ! -o docker0 -j MASQUERADE
...
-A POSTROUTING -s 10.1.0.7/32 -d 10.1.0.7/32 -p tcp -m tcp --dport 443 -j MASQUERADE
-A POSTROUTING -s 10.1.0.7/32 -d 10.1.0.7/32 -p tcp -m tcp --dport 80 -j MASQUERADE
-A OUTPUT ! -d 127.0.0.0/32 -m addrtype --dst-type LOCAL -j DOCKER
I was just able to track down the rpm files of the previous version of iptables I had installed. I'm going to try a manual roll back and see if it fixes my problem.
Great, let me know. Still blows my mind how it can affect only the result within the container.
Also wondering how iptables on the host could have an effect in the first place: I don't think it's even used? Your system uses nftables and I think it's only iptables-nft in the container that talks directly to nftables. I don't think it's even using the iptables installed on the host.
Well that fixed it. 🤕
Feel free to close this issue if you want, since it seems to be a problem with an external package. Though if you think it is a compatibility issue, I would be happy to help you continue to debug it (though not on my home prod system).
Full fix detailed, in case anyone else has the exact same stuck package scenario:
# Get all "old" packages
$ wget http://mirror.centos.org/centos/8/BaseOS/x86_64/os/Packages/iptables-1.8.4-15.el8.x86_64.rpm
$ wget http://mirror.centos.org/centos/8/BaseOS/x86_64/os/Packages/iptables-ebtables-1.8.4-15.el8.x86_64.rpm
$ wget http://mirror.centos.org/centos/8/BaseOS/x86_64/os/Packages/iptables-libs-1.8.4-15.el8.x86_64.rpm
$ wget http://mirror.centos.org/centos/8/BaseOS/x86_64/os/Packages/iptables-services-1.8.4-15.el8.x86_64.rpm
$ sudo yum downgrade ./iptables-*
# Destroy the container, just in case any loaded kernel modules stick around
$ docker rm -f ipv6nat
# Rebooting was the only thing that fixed it for me
$ sudo reboot
# Recreate the container
$ docker run -d --name ipv6nat --cap-drop ALL --cap-add NET_ADMIN --cap-add NET_RAW --network host --restart unless-stopped -v /var/run/docker.sock:/var/run/docker.sock:ro robbertkl/ipv6nat
# or whatever your command is
Thanks for all your help tracking down what was causing the problem!
Your system uses nftables and I think it's only iptables-nft in the container that talks directly to nftables. I don't think it's even using the iptables installed on the host.
You are right I think. I guess somehow the new package version has some bug that interacts with the kernel incorrectly, and then saves (and then later prints) the rules incorrectly?? Yeah, doesn't make sense to me either. I wouldn't even know how to go about reporting this as a bug to the package maintainer. I guess I would need to prove that the rule actually got saved wrong somehow.
Also wondering how iptables on the host could have an effect in the first place: I don't think it's even used?
Makes me wonder if I could have mounted the host xtables-nft-multi
binary in the container to fix it. Probably only if it was statically linked, since the container runs on Alpine (MUSL based IIRC).
Well that fixed it. 🤕
Wow, that was quite the journey. Great you figured it out! And thanks for the detailed fix.
Let's leave it at this. I'll keep an eye out for more reports of this issue.
Makes me wonder if I could have mounted the host
xtables-nft-multi
binary in the container to fix it. Probably only if it was statically linked, since the container runs on Alpine (MUSL based IIRC).
Yeah, that's usually a no-go, for that exact reason.
I can confirm that this error occurs with a fresh install of CentOS. The downgrade solution helped for me.
Thanks @thedejavunl, best to keep the issue open then.
I spoke to Phil Sutter from RedHat, who did both the upstream patch as well as its backport into RHEL8.3.
The commit in question is here. To quote Phil:
Sadly it is not part of an official release yet, ETA is v1.8.7.
About the issue we're seeing in the Docker container:
Basically it's a problem with data representation inside the container. The iptables binary in there doesn't respect the reduced payload expression length and due to absence of the (not needed) bitwise expression assumes the full address is being matched.
So aside from the workaround (downgrading as detailed here) I guess the only solution would be to either wait for 1.8.7 (and its Alpine edge packages) or build a patched version and ship that in the container image.
Wow @robbertkl, fantastic detective work!
To so to be 100% clear, this "backport" in the RedHat packages happened between versions 1.8.4-15.el8 and 1.8.4-15.el8_3.3?
And this change is expected to land in iptables v1.8.7?
As running two different versions of iptables against the same kernel probably wasn't ever intended, I can understand why this could happen.
As for my own environment, I will just freeze my iptables version to 1.8.4-15.el8 until v1.8.7 is released and updated here.
To so to be 100% clear, this "backport" in the RedHat packages happened between versions 1.8.4-15.el8 and 1.8.4-15.el8_3.3?
Correct, see the changelog here: https://centos.pkgs.org/8/centos-baseos-x86_64/iptables-services-1.8.4-15.el8_3.3.x86_64.rpm.html
Thank you very much for the explanation @robbertkl. 🥇
Hey,
does anybody know if there is an redhat bugtracker record to this?
As running two different versions of iptables against the same kernel probably wasn't ever intended, I can understand why this could happen.
If this is the issue, couldn't it be fixed by upgrading the ipv6net Docker container to use a newer version of iptables? Maybe as an opt-in (eg. a new tag)
The new IPTables package is included in Alpine Edge. It is not intended for production usage. The IPTables upgrade doesn't have any security fixes. Only bug fixes so if the firewall is working correctly the need to update the RPM packages is zero.
If this is the issue, couldn't it be fixed by upgrading the ipv6net Docker container to use a newer version of iptables? Maybe as an opt-in (eg. a new tag)
That seems to be the plan, once iptables is updated in Alpine. We all seem to agree that we should wait until it is "stable" before doing that.
Hi all,
With Docker 20.10.6 the ipv6nat function is fully intergrated (experimental).
You can add the following flags to your daemon.json:
{ "ipv6": true, "fixed-cidr-v6": "fd00::/80", "experimental": true, "ip6tables": true }
Heya,
as we still run into this issue, i did some research and also spoke to Phil about it a bit to understand it.
The fix/backport implemented a more optimized way to store rules in the kernel. Now the issue is the following: if the host supports this type of storing rules, but the container iptables doesn't, the output is messed up. It looks like so:
host # iptables-save | grep "LOCAL -j DOCKER" -A OUTPUT ! -d 127.0.0.0/8 -m addrtype --dst-type LOCAL -j DOCKER
container # xtables-nft-multi iptables-save | grep "LOCAL -j DOCKER" -A OUTPUT ! -d 127.0.0.0/32 -m addrtype --dst-type LOCAL -j DOCKER
So the rule is differently displayed, but the rule is "correct" (as inside the kernel).
As this is not easy to solve, as the version outside and inside the container must be "the same", may i suggest the following: the check that actually fails is part of the manager.go: https://github.com/robbertkl/docker-ipv6nat/blob/4cd961e56585b9506d69d37aeff1611c0e1e2b72/manager.go#L71 If the check would be changed to accept both versions, /8 and /32, the problem should be "gone".
Anything i missed? Would it be worth a try? I'm not a go-coder, so i have no clue how to do it myself, but i expect it to be "easy" to fix, at least much easier than getting both versions in sync.
Cheers, Sven
Just pushed out a new release v0.4.4 which contains the fix for this issue! Docker images for all architectures are on Docker Hub as :0.4.4
and :latest
. Thanks everyone!
@robbertkl I still get "unable to detect hairpin mode (is the docker daemon running?)" with 0.4.4 on synology DSM 6.2.4-25556 with synology-current docker version 20.10.3-0554 . I use Option B from the README.md. IPv6 works in general on the system
Version: v0.4.3 docker. Docker version: 20.10.1 and 20.10.2 OS: CentOS Linux release 8.3.2011 (Core)
After a system update, upon launching I get this error:
After which the container exits and restarts.
Thinking it might be a permissions issue, I removed all
--cap-add
s, leaving only the--cap-drop ALL
to test, but that broke it more:I then tried to give it
--cap-add ALL
, but that did not fix it.Since part of the system update was
docker-ce
, I thought maybe it had changed the backend rules, but:Clearly the right rule still exists. And checking manually:
The actual checking commands returns correctly as expected. I am using this code section as the reference: https://github.com/robbertkl/docker-ipv6nat/blob/v0.4.3/manager.go#L79-L86
At this point I downgraded dockerd back to 20.10.1, but I got the same error.
What is strange is that when I first did the system upgrade, dockerd restarted itself as usual, and all my containers came back online with IPv6 working. It was after an OS restart that this error started.
I tried to do a system rollback, but the old package versions couldn't be found, so I'm stuck.
Seems like coreos/go-iptables/issues/79 could be related.