Closed moonek closed 1 year ago
Do you know where that address is coming from? (224.0.0.251)?
Do you have pods within the multicast IP range? Normally we'd only program routes to the VXLAN device which are for pod subnets, so surprised to see that there at all.
@moonek any update on this one?
Couldn't solve it. Still 30-40 lines of logs are being generated in the kernel per second. Nowhere is IP in the 224 band being used.
$ kubectl get po -A -owide | grep 224
$
Nowhere is IP in the 224 band being used.
Something must be programming that entry targeting the vxlan.calico interface, and Calico is correctly spotting that it doesn't belong there (because there are no pods with that IP on the network).
Calico only programs static entries with known hwaddr. I'm guessing this entry is being put there due to something in your cluster trying to talk multicast over the vxlan.calico interface.
Based on a quick google, I see those multicast IPs are registered: https://www.iana.org/assignments/multicast-addresses/multicast-addresses.xhtml#multicast-addresses-12
224.0.0.251 | mDNS
224.0.0.22 | IGMP
Although anything could be trying to access them. Are you running anything like that ^ in your cluster?
Might be worth a TCP dump to see what's sending traffic to those addresses
In parallel, Calico should be able to remove those addresses. Might need to fine-tune the way we craft our deletion request so it's valid, or a way to skip entries which we know aren't programmed by us (currently we use the vxlan.calico interface to decide that, but if someone else is programming entries for that interface we can't tell the difference)
I had the same issue too.
@caseydavenport must find the source multicast IP from ? we can solve it. I try to delete the neigh 224, but failed.
For some more information, we are also seeing this issue on our 16.04 desktop nodes. These nodes are from a legacy deployment and there host os and kernel cannot be upgraded unfortunately.
After digging into those addresses, it appears that the avahi daemon is programming the mDNS (224.0.0.251) address, and systemd-resolve is programming the LLMNR (224.0.0.252). I have added a whitelist to the primary network interface for avahi and that address no longer appears in the neighor list, but I'm still working on figuring out how to disable the other two.
This post was enlightening and described the addresses and their likely sources: https://superuser.com/questions/1063676/is-my-arp-cache-poisoned
This issue helped me whitelist the primary network interface for the avahi daemon without disabling it entirely: https://github.com/freepn/fpnd/issues/67
I don't believe any of these workarounds to be solutions
2022-01-18 09:57:23.012 [WARNING][51] felix/route_table.go 968: Failed to delete neighbor FDB entry {LinkIndex:4 Family:7 State:128 Type:0 Flags:2 IP:224.0.0.251 HardwareAddr: LLIPAddr:
Vlan:0 VNI:0 MasterIndex:0} error=invalid argument ifaceName="vxlan.calico" ifaceRegex="^vxlan.calico$" ipVersion=0x4
It looks like the mac address is empty, which the ARP entry is incomplete. Maybe we should skip the incomplete ARP entry (empty mac address).
See the kernel function rtnl_fdb_del
https://elixir.bootlin.com/linux/v3.10.108/source/net/core/rtnetlink.c#L2223
cc @caseydavenport
Yeah interesting. Calico shouldn't be trying to remove any dynamically programmed ARP entries - Calico itself programs some static entries, but those should always have a HWAddr and not rely on dynamic resolution.
I've installed calico(
vxlan mode
) with the same manifest on dozens of kubernetes clusters, but this is the first time this has happened. All calico pods are running, and overlay network communication is working fine.but, warning log constantly occurs in all calico-node pods
Also, the below log occurs continuously in the kernel.
The peculiar thing I found is that the IP seen as multicast is
FAILED
.calico config and daemonset.
Your Environment