k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
27.39k stars 2.3k forks source link

vxlan network logs coming from flannel not being propagated #8912

Open manuelbuil opened 9 months ago

manuelbuil commented 9 months ago

Environmental Info: K3s Version:

Any

Node(s) CPU architecture, OS, and Version:

Cluster Configuration:

Describe the bug:

When there is an error in the creation of vxlan infrastructure, the error is not propagated to k3s or sent to journalctl, hence we are blind. E.g. https://github.com/k3s-io/k3s/issues/8794

Steps To Reproduce:

Expected behavior:

Actual behavior:

Additional context / logs:

thomasferrandiz commented 9 months ago

when running k3s, I can see logs coming from flannel like this line:

I1120 15:12:37.814046    3322 kube.go:510] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.0.0/24]

so I guess it's more an issue internal to flannel for some specific errors?

manuelbuil commented 9 months ago

when running k3s, I can see logs coming from flannel like this line:

I1120 15:12:37.814046    3322 kube.go:510] Creating the node lease for IPv4. This is the n.Spec.PodCIDRs: [10.42.0.0/24]

so I guess it's more an issue internal to flannel for some specific errors?

Ah nice! Thanks Thomas! Yeah, then it is probably a flannel issue not logging problems when trying to create the ip route and not working well (at least in ipv6)

thomasferrandiz commented 9 months ago

if there is an error when adding the ipv6 route it should be logged here: https://github.com/flannel-io/flannel/blob/c498d00a87d394d9df16cd4fc303a8ea6d83063f/pkg/backend/vxlan/vxlan_network.go#L264 Or maybe netlink doesn't return an error but fails silently in some cases?

Otherwise the issue could be that the flannel message is drowned in all the other logs.

manuelbuil commented 9 months ago

if there is an error when adding the ipv6 route it should be logged here: https://github.com/flannel-io/flannel/blob/c498d00a87d394d9df16cd4fc303a8ea6d83063f/pkg/backend/vxlan/vxlan_network.go#L264 Or maybe netlink doesn't return an error but fails silently in some cases?

Otherwise the issue could be that the flannel message is drowned in all the other logs.

Probably there is a silent failure...