Closed bcr-ch closed 1 year ago
@bcr-ch Most probably we do not configure servicelb (ex-klipperlb) to work with IPv6. The main repo is here https://github.com/k3s-io/klipper-lb, and docs here https://docs.k3s.io/networking#service-load-balancer. Please don't hesitate to submit a PR, and please have a look at the debug section in the Readme, it provides useful details.
@mysticaltech
Thanks for the response / looking into it. I think trying to address it and submitting a PR is outside my skill level. Nobody wants me writing code ;)
I have done some investigation and it looks to me like during the initial K3S install / configuration we need to enable dual-stack support. Probably by passing the appropriate values to --cluster-cidr and --service-cidr.
I found these links useful: https://github.com/k3s-io/k3s/issues/1405 https://github.com/k3s-io/k3s/issues/1405
This is where I kinda stall out, not sure which values are appropriate to pass, but I assume it could be specified in the kube.tf inside the k3s_exec_server_args variable.
Hope that helps for anyone else who might read this and can give me a few more pointers. As always, great project and really cool work.
@bcr-ch That's a good start, now if you can really pin point why the connection is refused, it would be great, you can do so by looking at the k3s logs, also making sure your DNS points to the AAAA IPv6 address(es) (you can have many of them, if you have many nodes)!
Also, after verifying your DNS is configured correctly, please also post the logs here of kubectl logs svclb-traefik-xxxx-yyyy -n kube-system
, comment our the IPs for privacy.
Same for ip address show
.
I have a feeling it might be a DNS issue, because now I remember, IPv6 traffic is automatically routed to IPv4 at the node level.
@mysticaltech I have configured my DNS and confirmed it is working properly. I have AAAA records configured for each of my agent nodes.
If I SSH to my node using this AAAA address it works (so it is accepting inbound connections on IPv6.
I next try a telnet to the host for a simple whoami workload. Here you see it tries each of the IPv6 addresses before falling back to IPv4.
telnet example.ABC 80
Trying V6Address::1...
telnet: connect to address V6Address:1: Connection refused
Trying V6Address:1...
telnet: connect to address V6Address:1: Connection refused
Trying V6Address:1...
telnet: connect to address V6Address:1: Connection refused
Trying v4Address.2.3.4...
Connected to app.k3s.ABC
Escape character is '^]'.
GET / HTTP/1.1
Host: example.ABC
HTTP/1.1 301 Moved Permanently
Location: https://example.ABC/
Date: Wed, 19 Jul 2023 18:10:54 GMT
Content-Length: 17
Content-Type: text/plain; charset=utf-8
Moved Permanently
Output of k logs svclb....
Defaulted container "lb-tcp-80" out of: lb-tcp-80, lb-tcp-443
+ trap exit TERM INT
+ BIN_DIR=/sbin
+ check_iptables_mode
+ set +e
+ grep nf_tables
+ lsmod
+ '[' 0 '=' 0 ]
+ mode=nft
+ set -e
+ info 'nft mode detected'
+ echo '[INFO] ' 'nft mode detected'
+ set_nft
+ ln -sf /sbin/xtables-nft-multi /sbin/iptables
nf_tables 270336 183 nft_log,nft_limit,nft_chain_nat,nft_compat
nfnetlink 20480 4 nfnetlink_log,ip_set,nft_compat,nf_tables
libcrc32c 12288 4 nf_tables,nf_nat,nf_conntrack,btrfs
[INFO] nft mode detected
+ ln -sf /sbin/xtables-nft-multi /sbin/iptables-save
+ ln -sf /sbin/xtables-nft-multi /sbin/iptables-restore
+ ln -sf /sbin/xtables-nft-multi /sbin/ip6tables
+ start_proxy
+ grep -Eq :
+ echo 0.0.0.0/0
+ iptables -t filter -I FORWARD -s 0.0.0.0/0 -p TCP --dport 80 -j ACCEPT
+ grep -Eq :
+ echo 10.43.230.13
+ cat /proc/sys/net/ipv4/ip_forward
+ '[' 1 '==' 1 ]
+ iptables -t filter -A FORWARD -d 10.43.230.13/32 -p TCP --dport 80 -j DROP
+ iptables -t nat -I PREROUTING -p TCP --dport 80 -j DNAT --to 10.43.230.13:80
+ iptables -t nat -I POSTROUTING -d 10.43.230.13/32 -p TCP -j MASQUERADE
+ '[' '!' -e /pause ]
+ mkfifo /pause
Output of ip a show
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 96:00:02:a:b:c brd ff:ff:ff:ff:ff:ff
altname enp1s0
inet 1.2.3.4/32 scope global dynamic noprefixroute eth0
valid_lft 65779sec preferred_lft 65779sec
inet6 V6Address::1/64 scope global noprefixroute
valid_lft forever preferred_lft forever
inet6 fe80::9400:2ff:fe5e:35d6/64 scope link noprefixroute
valid_lft forever preferred_lft forever
When I try to deploy some test services to verify IPv6 support I get an error message as per below saying IPv6 is not configured on my cluster.
k apply -f https://gist.githubusercontent.com/aojea/90768935ab71cb31950b6a13078a7e92/raw/99ceac308f2b2658c7313198a39fbe24b155ae68/dual-stack.yaml -n test
deployment.apps/httpd-deployment created
service/my-service-v4 created
service/my-service-prefer-dual created
Error from server (Invalid): error when creating "https://gist.githubusercontent.com/aojea/90768935ab71cb31950b6a13078a7e92/raw/99ceac308f2b2658c7313198a39fbe24b155ae68/dual-stack.yaml": Service "my-service-v6" is invalid: spec.ipFamilies[0]: Invalid value: "IPv6": not configured on this cluster
A tcpdump of my telnet o the domain shows a RST right after the SYN.
My hypothesis is that because we don't pass the --cluster-cidr and --service-cidr to the k3s binary as an arg during install we are not configuring the dual stack for the klipper lb. Therefore the NF_TABLES isn't setting up the required NAT entries for IPv6.
@bcr-ch Thanks for the info, interesting. Yes the cluster cidr is passed, we use the /etc/rancher/k3s/comfig.yaml for that.
How do you know telnet is not trying to use IPv4, maybe try with a dummy server work load, so you can test with curl, and double check that the http port is open on the hetzner firewall.
Also, how did you make it so that your nodes do not have IPv4s, do you use a custom implementation? If so please post a dummy PR so we can see the diff 🙏
Hello,
no my nodes all have both ipv4 and ipv6 addresses.
For each node in my cluster I have an A record and a AAAA record all point to the same domain. For example (app.K3S.my.domain)
If you run an nslookup for my domain it returns all records (the ipv4 and the ipv6).
Curl behaves exactly the same as telnet. My client systems (macos or OpenSuse) prefer ipv6 so try those records first. When it fails it falls back to ipv4.
@bcr-ch I ran this who issue through our good friend GPT-4 and here's what it says:
This is quite a detailed problem and your user has been very thorough in the diagnosis. Based on the information provided, here's an analysis and some recommended troubleshooting steps:
ServiceLB & IPv6: The core issue is klipper-lb
(or ServiceLB) not working with IPv6. The user tried replacing the Hetzner Load Balancer with klipper-lb
and traefik
. While it works with IPv4, they are getting 'Connection refused' with IPv6.
SSH & DNS: SSH using the IPv6 address works, meaning the server does allow inbound IPv6 connections. DNS resolution with AAAA records is also correctly configured. This means that the server recognizes the IPv6 address but refuses connections on it for the service.
Fallback to IPv4: The user did a telnet
test, and while the IPv6 connection attempts are refused, it eventually falls back and connects via IPv4.
Logs: The logs provided show klipper-lb
doing NAT configurations using iptables
in nft
mode for IPv4 addresses. The logs, however, do not show any configuration regarding IPv6.
IPv6 with ServiceLB: The initial guess was that ServiceLB (klipper-lb
) might not be correctly set up for IPv6. While K3s is moving towards dual-stack support, which means supporting both IPv4 and IPv6, it's important to check whether klipper-lb
also fully supports it. The user found a potential approach to enable dual-stack, but it's not clear from the provided information whether they implemented it.
Check Dual-stack Configuration: It's vital to ensure that the cluster CIDRs for both IPv4 and IPv6 are correctly set up, using the --cluster-cidr
and --service-cidr
flags. This is the way to get dual-stack (IPv4 and IPv6) support in Kubernetes.
Update ServiceLB Configuration: It looks like klipper-lb
is only setting up iptables
rules for IPv4 (as seen in the logs). You may need to ensure that the corresponding ip6tables
rules for IPv6 are also being set up.
Network Policies & Firewall Rules: Ensure that there are no network policies or firewall rules that block inbound IPv6 traffic for the specific ports used by klipper-lb
.
Try another LoadBalancer: As a temporary measure, you might want to test with another LoadBalancer solution that explicitly supports IPv6. This will confirm if the issue is isolated to klipper-lb
or if there's a more general configuration problem in your cluster.
Get More Logs: The logs you provided are from the klipper-lb
container handling IPv4. It would be useful to see logs from a container when an IPv6 request is made. Also, logs from traefik
might provide more insight into why the connection is refused.
Issue with K3s & IPv6: The links the user found might indicate existing issues with K3s and IPv6. It's essential to check the status of these issues, any potential workarounds provided, or if they are already resolved in newer versions of K3s.
Continue the Community Discussion: Engage with the K3s community and the kube-hetzner
community. Given the project's open-source nature, there might be others facing similar issues or contributors who can give specific insights.
To summarize, the core issue seems to be the lack of explicit IPv6 support/configuration in klipper-lb
. Addressing this either by adjusting configurations, applying workarounds, or using an alternative LoadBalancer solution might be the way forward.
With servicelb, I want to be able to listen to incoming IPv6 traffic.
If you want servicelb
(a simple in-cluster load balancer provided by K3s) to listen to incoming IPv6 traffic, you'll need to make sure of a few things:
Node's External IPv6 Configuration:
Servicelb Configuration:
servicelb
in K3s uses the host's networking. Therefore, if the host is set up to listen on IPv6, then servicelb
should naturally be able to route incoming IPv6 traffic to the services.LoadBalancer
in Kubernetes. When you create a service of this type in K3s, servicelb
will automatically handle the provisioning of the load balancer.Service Configuration:
ipFamily
as IPv4
if you want it to support IPv6.ipFamilyPolicy
and choose a strategy (SingleStack
, PreferDualStack
, RequireDualStack
). Depending on your exact requirements, you might want to use PreferDualStack
or RequireDualStack
.For example:
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
selector:
app: MyApp
ports:
- protocol: TCP
port: 80
targetPort: 9376
type: LoadBalancer
ipFamilyPolicy: PreferDualStack
Monitoring and Logging:
kubectl describe service <service-name>
to ensure that it has received an IPv6 address.servicelb
logs to troubleshoot any issues related to IPv6 traffic routing.DNS:
Potential Limitations:
servicelb
or the K3s version you are using has issues with IPv6 (bugs, incomplete features, etc.), you might need to look into alternative load balancers or solutions like MetalLB, which offers more comprehensive IPv6 support.Finally, once everything is set up, test the load balancer externally using IPv6 to ensure it's routing traffic as expected to your services.
Do I need dual stack? For that?
No, you do not necessarily need dual stack (support for both IPv4 and IPv6 simultaneously) to have servicelb
listen to incoming IPv6 traffic.
If you only care about IPv6 traffic and your nodes already have IPv6 access, you can set up your Kubernetes service to only use IPv6. However, without dual-stack, the service will not be reachable via IPv4.
Here's what you'd do:
Ensure Node's External IPv6 Configuration:
Service Configuration for IPv6:
ipFamily
as IPv6
.For example:
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
selector:
app: MyApp
ports:
- protocol: TCP
port: 80
targetPort: 9376
type: LoadBalancer
ipFamily: IPv6
Set Up servicelb
:
servicelb
in K3s, it will use the host's networking. So, if your host node is set up to listen on IPv6, then servicelb
should naturally route incoming IPv6 traffic to the services.DNS Configuration:
Remember, without dual-stack, the services set to IPv6 will not be reachable via IPv4. If that's acceptable for your use case, then you don't need to worry about setting up dual-stack. However, if you want the flexibility of reaching services via both IP versions, then dual-stack becomes necessary.
What it net.ipv6.conf.all.accept_ra=2, do I need it?
The net.ipv6.conf.all.accept_ra
sysctl setting in Linux controls how Router Advertisements (RAs) are processed. Router Advertisements are an essential part of IPv6 and are used for address autoconfiguration and the discovery of available routers and DNS configurations.
The values for net.ipv6.conf.all.accept_ra
are:
0
: Do not process Router Advertisements.1
: Process Router Advertisements (the default).2
: Process Router Advertisements even if forwarding is enabled.In a typical client or host setting, forwarding is disabled, and the machine will process RAs by default. However, on routers or any machine with IP forwarding enabled (i.e., net.ipv4.ip_forward
and/or net.ipv6.conf.all.forwarding
set to 1
), the processing of RAs is disabled to prevent address autoconfiguration on a machine that's forwarding traffic.
In certain situations, especially with some more complex Kubernetes setups where nodes might act as routers but still need to autoconfigure addresses using RAs, the value of 2
for net.ipv6.conf.all.accept_ra
is used. This allows the machine to process RAs even when IP forwarding is enabled.
If you're setting up Kubernetes on nodes that have forwarding enabled but still need to receive and process IPv6 router advertisements for address autoconfiguration or other information, then setting net.ipv6.conf.all.accept_ra=2
would be necessary. Otherwise, if your nodes are pre-configured with static IPv6 settings and don't need RAs, you might not need this setting.
@bcr-ch So what I recommend is trying with Hetzner LB first, if it works there with IPv6, then the problem is not with kubernetes but with Klipper-LB/ServiceLB or the the way IPv6 behaves on the nodes. And then we can try tweaking the IPv6 config for the nodes. But first important to confirm that.
@mysticaltech
Thanks for the GPT advice ;) And of course your support as well.
So I modified my kube.tf and disabled klipper deploying instead a Hetzner LB.
I now how a LB (with both an IPv6 and an IPv4) address redirecting to my three agent nodes. I have updated my DNS so I have an A record and AAAA record pointing to the LB.
Now when I access my website "example.my.domain" it first tries the IPv6 address and succeeds. So I'm getting the feeling that the problem sits with how IPv6 on the nodes is configured with ServiceLB.
@bcr-ch Exactly, thanks for confirming this! I will have a look.
@bcr-ch Please try adding this to the cloud-init template:
#cloud-config
write_files:
- content: |
net.ipv6.conf.all.accept_ra=2
path: /etc/sysctl.d/60-ipv6-accept-ra.conf
permissions: '0644'
runcmd:
- [ sysctl, -p, /etc/sysctl.d/60-ipv6-accept-ra.conf ]
I cannot easliy test on my end because I do not have IPv6 locally. So try setting the above, if it work, we will add that step to the project.
@mysticaltech
I modified my locals.tf file to modify the cloud-init settings as suggested above and redeployed my cluster.
Unfortunately I get the same behaviour as before. I have confirmed by connecting to my Agent Node that the setting applied
sysctl -a | grep net.ipv6.conf.all
produces the output:
net.ipv6.conf.all.accept_ra = 2
I will play with the initial K3S server install as I think there may be something in there I can tweak. Will post back on my success or failure ;)
Ok, so more digging in my test environment and I've now come across this:
/var/lib/rancher/k3s/agent/etc/flannel/net-conf.json
Based on my reading of this old issue in flannel
{
"Network": "10.42.0.0/16",
"EnableIPv6": false,
"EnableIPv4": true,
"IPv6Network": "::/0",
"Backend": {
"Type": "wireguard",
"PersistentKeepaliveInterval": 25,
"Mode": "separate"
}
}
So our IPv6 is not getting enabled in flannel, which is used by k3s by default.
@bcr-ch Thanks for trying the ipv6-access value. About the above, I believe that is for dual stack and probably not what we want as the IPv6 traffic is coming from outside.
@kube-hetzner/core Any ideas on this please? I'm dry on ideas.
Yes you are correct... it doesn't work because we are using the private network for the internal traffic. This network doesn't support IPv6 so getting it to listen on IPv6 wouldn't work. Also not what we want. We don't want a dual stack cluster, just want the svclb front-end to accept IPv6 traffic.
I found this old post for klipper-lb. I installed socat onto the cluster, and ran it as follows:
socat TCP6-LISTEN:80,fork TCP4:127.0.0.1:80
Now when I telnet to my cluster the IPv6 address works ;). It accepts the traffic and redirects it to the svclb listening on IPv4. This sounds like a feature missing in svclb
@mysticaltech, I suppose we can close the request if you agree. Sounds very much like a a limitation in klipper-lb. I can work on a PR where I have an option that would install SOCAT and create/start services on each node to redirect traffic... but TBH it sounds like a hack and probably not something you want. Also I don't do a lot of dev anymore so it will take me a while :)
Thanks for all the support
Very good find @bcr-ch 🍾 ! This was good team work, well done. A PR would have be great but since we do not know exactly which port needs connecting to IPv4, it's not practical. And anyways, thank goodness the Hetzner LB automatically takes care of that translation.
Now if there is a way for socat to forward all IPv6 ports to their equivalent IPv4, then why not. If you're up to it, PR is most welcome, and should be quite quick.
@bcr-ch Just saw the donation, thank you for that, really appreciate it! 🙏
Description
I am using klipper-lb and traefik instead of the Hetzner Load Balancer.
When i use ipv4 address of my agent nodes it works no problem. When I use ipv6 address I get 'Connection refused'
Seems that the host is rejecting inbound IPv6. I have the default FW rules and no issue
Kube.tf file
Screenshots
No response
Platform
Linux