invalid route configured due to blockaffinity without ipamblock

juliantaylor commented 3 years ago

After a large deployment we had a short kubernetes apiserver overload (causing connection failures) which was followed by an invalid route configured by calico. The invalid route had no corresponding ipamblock object but did have a blockaffinity object which did prevent the route from being removed. No pods where in the ip range of the route/blockaffinity. After manually deleting the block the route was removed automatically.

$ kubectl get ipamblocks.crd.projectcalico.org 100-70-221-192-26 
Error from server (NotFound): ipamblocks.crd.projectcalico.org "100-70-221-192-26" not found
# note the blockaffinity is new created around the same time as the outage
$ kubectl get blockaffinities.crd.projectcalico.org kworker-be-prod-iz2-270-100-70-221-192-26
kworker-be-prod-iz2-270-100-70-221-198-26   61m
# route on the servers
100.70.221.128/26 via xx.xx.199.27 dev tunl0 proto bird onlink 
100.70.221.192/26 via xx.xx.199.27 dev tunl0 proto bird onlink   <<<< invalid route

on the node of dangling blockaffinity the block was blackholed.

100-70-221-128-26 had a ipamblock and an affinity.

The calico-node, calico-controllermanager and calico-typha logs showed nothing interesting besides some connection cancelled messages some watches during the short apiserver outage. E.g. controller log

2021-03-16 12:13:43.773 [INFO][1] resources.go 377: Terminating main client watcher loop
2021-03-16 12:13:43.780 [INFO][1] resources.go 349: Main client watcher loop
2021-03-16 12:29:03.351 [INFO][1] watchercache.go 96: Watch channel closed by remote - recreate watcher ListRoot="/calico/resources/v3/projectcalico.org/nodes"
<<< short outage
2021-03-16 12:30:45.731 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get "https://100.72.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2021-03-16 12:30:45.731 [ERROR][1] main.go 207: Failed to verify datastore error=Get "https://100.72.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
<<<< recovery
2021-03-16 12:31:00.126 [INFO][1] watchercache.go 96: Watch channel closed by remote - recreate watcher ListRoot="/calico/resources/v3/projectcalico.org/nodes"
2021-03-16 12:31:00.127 [INFO][1] resources.go 377: Terminating main client watcher loop
2021-03-16 12:31:11.737 [INFO][1] resources.go 349: Main client watcher loop
2021-03-16 13:08:19.700 [INFO][1] resources.go 377: Terminating main client watcher loop
2021-03-16 13:08:19.707 [INFO][1] resources.go 349: Main client watcher loop

Restarting all calico components did not change anything.

We are not sure if this might have been caused by the apiserver outage or simply by some error in the ipamblock/affinity creation and subsequent deletion.

The cluster has 260 nodes, about 9000 running pods, 6 calico typha instances and 3 calico route reflectors.

Your Environment

Calico version 3.17.2
Orchestrator version (e.g. kubernetes, mesos, rkt):kubernetes 1.18.14
Operating System and version: flatcar linux 2605.12

juliantaylor commented 3 years ago

The dangling blockaffinity had following content:


spec:
  cidr: 100.70.221.192/26
  deleted: "false"
  node: kworker-be-prod-iz2-270
  state: pending

caseydavenport commented 3 years ago

@juliantaylor this looks to me like the apiserver connection failures happened in the middle of allocating a new block to that node. Allocation of a new block is a multi-step process, which looks something like this:

Create a pending BlockAffinity
Create the Block
Confirm the BlockAffinity

It looks like step 1 happened, but the API server issues likely happened immediately after, preventing the subsequent steps. This should be OK, in that the next time someone tries to allocate that block the state will be cleaned up.

IIRC, we shouldn't be advertising routes for pending affinities, based on this filter logic here: https://github.com/projectcalico/confd/blob/master/etc/calico/confd/templates/bird_aggr.cfg.template#L35-L45

However, it does look like we will program a local blackhole for the traffic.

When this happened, were you seeing the route advertised to other nodes? Or just the local blackhole? And, was it causing routing issues? I wouldn't expect it to actually impact traffic.

sfudeus commented 3 years ago

IIRC and from @juliantaylor 's description it was advertised via BGP (see 100.70.221.192/26 via xx.xx.199.27 dev tunl0 proto bird onlink) in the routes of other nodes, and it was a local blackhole route on the affected node.
It did not cause routing issues, we observed that only in monitoring, since because of prior issues we monitor and compare the amount of configured routes and the amount of ipam blocks.

caseydavenport commented 3 years ago

Sounds like we need to revisit and see why routes are being advertised for blockaffinities with state: pending. I don't think that is working as intended.

Cojacfar commented 7 months ago

Was there a workaround for this? I have a BlockAffinity that is causing this behavior, advertising a route that will blackhole. However, attempts to delete it notify me that it's Read-Only.

Although it also tells me it does not exist when I attempt to read this particular Affinity entry.

brandond commented 7 months ago

@Cojacfar are you sure your block affinities aren't stuck in deleted: true, state: pendingDeletion?

The docs at https://docs.tigera.io/calico/latest/reference/resources/blockaffinity state:

deleted | When set to true, clients should treat this block as if it does not exist.

That might explain why you can list the affinity but not get it by name. At that point the question would be, why is it still pending - are there perhaps still pods running with IPs in this CIDR block?

Cojacfar commented 7 months ago

There aren't any pods running with the IP range on the node sharing the route. Although that would make a lot of sense. I actually removed this node completely from the cluster, uninstalling RKE2, and re-added it thinking something similar about leftover addresses to share. However, the rule just got recreated by calico-node.

brandond commented 7 months ago

It is stuck pendingDeletion though, yes? And persists even after both deleting the node object from the cluster, and uninstalling the node completely?

the rule just got recreated by calico-node.

Did the blockAffinity disappear when the node was deleted, and come back after rejoining it? Or was it there the whole time? If the former, I wonder if there is perhaps a stale state file that is getting left behind on the node that persists across installations?

Cojacfar commented 7 months ago

Hmm. I'm not sure if it disappeared when removed or not.

It is in pendingDeletion though, and it's actually noted in the BIRD file that it's awaiting deletion so it's blackholed.

protocol static {
# IP blocks for this host.
route 10.218.114.128/25 blackhole;
route 10.218.180.192/26 blackhole;
}

# Aggregation of routes on this host; export the block, nothing beneath it.
function calico_aggr ()
{
# Block 10.218.114.128/25 is confirmed
if ( net = 10.218.114.128/25 ) then { accept; }
if ( net ~ 10.218.114.128/25 ) then { reject; }
# Block 10.218.180.192/26 is pendingDeletion
}

Cojacfar commented 7 months ago

@brandond Do you know of any way to force this deletion? I can't modify the resource directly as it both doesn't exist and/or is read-only.

brandond commented 7 months ago

I don't. I see that @caseydavenport picked this issue up - perhaps there are some suggestions from the Calico team?

caseydavenport commented 7 months ago

There aren't any pods running with the IP range on the node sharing the route. Although that would make a lot of sense

One thing to note is that it might not be a pod, but the tunnel address of that node that is claiming the block.

projectcalico / calico

invalid route configured due to blockaffinity without ipamblock #4471

Your Environment