Open robbertkl opened 7 years ago
I'm unable to reproduce this on a local VM, in a non-IPv6 environment. Perhaps the presence of an IPv6 router causes the default route to get removed after being set? Even though autoconf
is off for the interface, accept_ra
is still on (but not sure if it's doing anything when autoconf
is off).
After some more testing with my IPv6 cloud machine, I can confirm this indeed seems to be because of router advertisements. During the boot process, a fe80::something
default route gets added, which disappears soon after (not sure why). When the cloud-config.yml
gets processed, it probably tries to add the default route which fails because there already is one. It will fail in silence because of: https://github.com/rancher/os/blob/7615c26f44c3f88530a7e76b3ae5867a7cfb8bf8/netconf/netconf_linux.go#L327. When the router-advertised route then disappears, no default route is left.
Is there a reason the 2nd parameter to SetGateway
is set to true
so it will only add the gateway if there is no default gateway yet? Also, the code here (https://github.com/rancher/os/blob/7615c26f44c3f88530a7e76b3ae5867a7cfb8bf8/netconf/netconf_linux.go#L412) removes even IPv6 addresses, even though it's only meant for IPv4 DHCP setting.
I'm willing to work on a PR to clean this up a bit, keeping IPv6 in mind. Can you let me know if this would make sense, @SvenDowideit ?
Unfortunately, I can't turn off accept_ra
before the network starts, because RancherOS applies sysctl later in the process (see #1175 / #1539). Therefore, my only solution seems to be:
write_files:
- path: /opt/rancher/bin/start.sh
permissions: "0755"
owner: root
content: |
#!/bin/bash
sysctl -w net.ipv6.conf.eth0.accept_ra=0
ip -6 route del default
ip -6 route add default via xxxx:xxxx:xxxx:xxxx::xxxx dev eth0
@robbertkl yes, a PR would be very helpful - I don't have IPv6 here, and so haven't got experience with it :/
Actually, I think some of the changes I would like to make (like changing the true
to false
like described in my previous message) would break backward compatibility / introduce different behaviour, which is why a PR probably won't go through.
As I understand the code actually tries to revert DHCP when it's turned off, but was already initialised earlier, instead of never being started in the first place. Not very elegant, but I guess it works when the network has to be initialised before the cloud-config.yml
can be processed. This is controlled by the dhcp: bool
setting, which can be turned off using a rancher.*
kernel boot parameter, to prevent DHCP being started in the first place.
For IPv6, however, there usually is no setting, and no "dhcp daemon", since this is controlled by sysctls. Using the sysctl values (or introducing a new cloud-config setting that sets it) seems not the right way, but perhaps this is the only way since the network is already initialized at the time the cloud-config gets processed. What do you think? Should it mimic the IPv4 behaviour by changing the sysctls, removing already autoconfigured addresses and/or routes, then adding the new ones?
To make matters more complicated, instead of having DHCP set both IP address and gateway, they are separate with IPv6 (autoconf
and accept_ra
sysctls).
yeah, we do need to bring up the network before we probe for / get the cloud-init file :/
but - do we need to bring up the IPv6 network then? Does anyone use IPv6 to get a network cloud-init.yml?
Well, yes, it's no use to improve IPv6 support in one area and cut it out of another. With forced IPv6-only around the corner, it's not wise to assume everyone uses IPv4 to get a network cloud-init.yml.
The workaround didn't work for me, I "solved" the problem by adding this to the cloud-config:
rancher:
network:
post_cmds:
- "ip -6 route del default"
- "ip -6 route add default via [GATEWAY] dev eth0"
Your solution did not work for me because when the startup scripts are executed, the network might not be ready yet, so in my case, the changes were not actually applied.
RancherOS Version: (ros os version) v1.0.3
Where are you running RancherOS? (docker-machine, AWS, GCE, baremetal, etc.) Cloud VPS (qemu)
I'm having a cloud-config like this:
Unfortunately, the IPv6 default route is not always present after a reboot. Sometimes it is, sometimes it isn't.
Please note I just started running with kernel boot parameter
ipv6.autoconf=0
, which effectively turns off my link-local address on eth0, since I only want to use my manually set IPv6 address for outgoing traffic. I have to set this as a kernel boot parameter instead ofnet.ipv6.conf...
sysctl, because RancherOS does not apply sysctls before initializing the network. However, this issue was already present before I started usingipv6.autoconf=0
.