IPv6 issues resolved for platform pods

Summary of problem: https://github.com/m-lab/epoxy-images/issues/209

At 2021-10-04Z21:38 the fix for the above issue was released to production. Right at that moment you can see a pretty much linear increase in the number of alive IPv6 probes as the rollout progressed:

Screenshot from 2021-11-16 10-33-24

The number of pods with functional IPv6 went from around 3425 to 4055, a difference of around 630.

This is being noted here in data-annotations because this could have potentially caused a noticeable shift in traffic from IPv4 to IPv6.

Because this issue was caused by a race condition, it is probably not deterministic which sites/pods could have been affected at any given time, and the list probably shifted over time. That said, on the morning of October 4 (before the fix was deployed) this is the list of sites where anywhere from one to all pods were affected:

ams03
ath03
atl02
beg01
bog02
dfw08
gig01
gru02
gru04
hkg02
hkg03
iad02
iad04
lax02
lax06
lga08
lju01
maa02
mex01
mex02
mia02
mia03
mnl01
mty01
nbo01
nuq03
ord02
sea08
svg01
syd02
tgd01
tpe01
trn02
tun01
yul05
yul06
yyz05
yyz06

The list was generated with this query:

count by (site) (probe_success{module=~".*v6.*", ipv6="present"} == 0 and on(site) kube_node_status_condition)

m-lab / data-annotations

IPv6 issues resolved for platform pods #27