Open dougbtv opened 6 months ago
I understand that the PR changes whereabout to change the identifier of allocation from podRef(Pod namespace/name) to contianer id. So how about to change reconciler and other code to match between Pod and allocation pool, i.e. garbageCollectPodIPs():pkg/controlloop/pod.go?
Tomo not sure I'm following... This is to change the identifier during reconciliation. Do you mean to change the identifier to pod and allocation pool? I'm probably being dense
I actually ran this 50 times and got 3 failures...
One I believe was a flake with a timeout starting the network controller, and then these two errors:
------------------------------
• Failure [0.001 seconds]
Whereabouts IP reconciler
/home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:37
reconciling an IP pool with multiple pods attached
/home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:101
each with IP from the same IPPool
/home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:125
reconciling the IPPool
/home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:142
the IPPool should have only the IP reservation of the live pod [It]
/home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:157
Mismatch at key: 1
Expected
<string>: default/pod1
to equal
<string>:
/home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/reconciler/ip_test.go:172
and
------------------------------
• Failure [1.003 seconds]
IPControlLoop
/home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:37
a running pod on a node
/home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:61
IPPool featuring an allocation for the pod
/home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:116
the network attachment is available
/home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:127
when the associated pod is deleted
/home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:160
stale IP addresses are garbage collected [It]
/home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:165
Timed out after 1.001s.
the ip control loop should have removed this stale address
Expected
<map[string]v1alpha1.IPAllocation | len:1>: {
"0": {
ContainerID: "",
PodRef: "default/tiny-winy-pod",
},
}
to be empty
/home/doug/codebase/src/github.com/dougbtv/whereabouts/pkg/controlloop/pod_controller_test.go:170
This is to change the identifier during reconciliation. Do you mean to change the identifier to pod and allocation pool? I'm probably being dense Should it be changed?
So, in a discussion with Tomo, he demonstrated that under crio, the containerid doesn't match the pod sandbox, so, this won't work under all runtimes. So we'll need a different approach for having a uniqueid to delete on for all container runtimes.
So, in a discussion with Tomo, he demonstrated that under crio, the containerid doesn't match the pod sandbox, so, this won't work under all runtimes. So we'll need a different approach for having a uniqueid to delete on for all container runtimes.
I discussed with Tomo, it seems the containerID and sandboxID is the same on openshift with crio (I tested it on 4.14 with this PR applied)
I discussed with Tomo, it seems the containerID and sandboxID is the same on openshift with crio (I tested it on 4.14 with this PR applied)
Let me correct that. ContainerID and sandboxID is different in crio/OpenShift, but @zshi-redhat test/verified that the PR fixes the issue.
Hi all, I had a look at this and I think this fix is not complete. First of all, the issue happens when within a pool there are 2 entries (different ips) for the same pod reference. When the ip-control-loop runs, the method findOrphanedIPsPerPool
can return different values based on the order of the entries (ipReservations
) of an allocation and list of pods (listing pods is not ordered). So, the function isPodAlive
within findOrphanedIPsPerPool
will fail for both entries if the pod with unknown status is listed first. I fixed this with some small changes in isPodAlive
func (rl ReconcileLooper) isPodAlive(podRef string, ip string) bool {
for livePodRef, livePod := range rl.liveWhereaboutsPods {
if podRef == livePodRef {
isFound := isIpOnPod(&livePod, podRef, ip)
if isFound {
return true
} else if !isFound && (livePod.phase == v1.PodPending) {
/* Sometimes pods are still coming up, and may not yet have Multus
* annotation added to it yet. We don't want to check the IPs yet
* so re-fetch the Pod 5x
*/
podToMatch := &livePod
retries := 0
logging.Debugf("Re-fetching Pending Pod: %s IP-to-match: %s", livePodRef, ip)
for retries < storage.PodRefreshRetries {
retries += 1
podToMatch = rl.refreshPod(livePodRef)
if podToMatch == nil {
logging.Debugf("Cleaning up...")
return false
} else if podToMatch.phase != v1.PodPending {
logging.Debugf("Pending Pod is now in phase: %s", podToMatch.phase)
return isIpOnPod(podToMatch, podRef, ip)
} else {
isFound = isIpOnPod(podToMatch, podRef, ip)
// Short-circuit - Pending Pod may have IP now
if isFound {
logging.Debugf("Pod now has IP annotation while in Pending")
return true
}
time.Sleep(time.Duration(500) * time.Millisecond)
}
}
}
}
}
return false
}
Regarding the use of container ID, I think it's a bit weird that the person that wrote ReconcileIPPools
uses podRef
instead of ContainerID
given that this function call IterateForDeallocation
which is also used by the pod controller at the moment of deallocating. In fact, the argument of this function explicitly says containerID
. Perhaps, there is a good reason for this. In any way, given that some runtimes do not have a 1 to 1 correspondence between container ID and sandbox ID, we could use the IP
(together with the podRef
) to identify an allocation. The Allocations are store in map whose index is the offset of the reserved IP with respect to the network range. This offset is used to calculate the IP
of IPReservation
(done in toIPReservationList
). So, I modified ReconcileIPPools
based on this. Note that I didn't modify IterateForDeallocation
to not affect the pod controller but perhaps we should. It makes sense also there.
func (rl ReconcileLooper) ReconcileIPPools(ctx context.Context) ([]net.IP, error) {
matchByPodRefAndIp := func(reservations []types.IPReservation, podRef string, ip net.IP) int {
foundidx := -1
for idx, v := range reservations {
if v.PodRef == podRef && v.IP.String() == ip.String() {
return idx
}
}
return foundidx
}
var totalCleanedUpIps []net.IP
for _, orphanedIP := range rl.orphanedIPs {
currentIPReservations := orphanedIP.Pool.Allocations()
for _, allocation := range orphanedIP.Allocations {
pos := matchByPodRefAndIp(currentIPReservations, allocation.PodRef, allocation.IP)
if pos < 0 {
// Should never happen
return nil, fmt.Errorf("did not find reserved IP for container %v", allocation.PodRef)
}
// Delete entry
currentIPReservations[pos] = currentIPReservations[len(currentIPReservations)-1]
currentIPReservations = currentIPReservations[:len(currentIPReservations)-1]
logging.Debugf("Going to update the reserve list to: %+v", currentIPReservations)
if err := orphanedIP.Pool.Update(ctx, currentIPReservations); err != nil {
return nil, logging.Errorf("failed to update the reservation list: %v", err)
}
totalCleanedUpIps = append(totalCleanedUpIps, allocation.IP)
}
}
return totalCleanedUpIps, nil
}
I tested all this in my setup and wrote a unit test as well. Disclaimer: the fake k8sclient do not support to have 2 pods with the same name. Only the last one is stored. Other scenarios were tested by manually changing the order.
Context("reconciling an IP pool with entries from the same pod reference", func() {
var wbClient wbclient.Interface
var podRunning *v1.Pod
var podTerminated *v1.Pod
It("verifies that the entry for the pod in unknown state is the only one deleted", func() {
By("creating 2 pods with the same name and namespace. 1st pod without network-status annotation and in an unknown state")
podTerminated = generatePod(namespace, podName, ipInNetwork{})
podTerminated.Status.Phase = v1.PodUnknown
podTerminated.ObjectMeta.ResourceVersion = "10"
k8sClientSet = fakek8sclient.NewSimpleClientset(podTerminated)
podRunning = generatePod(namespace, podName, ipInNetwork{ip: firstIPInRange, networkName: networkName})
k8sClientSet = fakek8sclient.NewSimpleClientset(podRunning)
By("creating an IP pool with 2 entries from the same pod. First entry corresponds to a previous pod stuck in terminating state")
pool := generateIPPoolSpec(ipRange, namespace, podName)
pool.Spec.Allocations = map[string]v1alpha1.IPAllocation{
"1": {
PodRef: fmt.Sprintf("%s/%s", namespace, podName),
},
"2": {
PodRef: fmt.Sprintf("%s/%s", namespace, podName),
},
}
wbClient = fakewbclient.NewSimpleClientset(pool)
By("initializing the reconciler")
var err error
reconcileLooper, err = NewReconcileLooperWithClient(context.TODO(), kubernetes.NewKubernetesClient(wbClient, k8sClientSet, timeout), timeout)
Expect(err).NotTo(HaveOccurred())
By("reconciling and checking that the correct entry is deleted")
deletedIPAddrs, err := reconcileLooper.ReconcileIPPools(context.TODO())
Expect(err).NotTo(HaveOccurred())
Expect(deletedIPAddrs).To(Equal([]net.IP{net.ParseIP("10.10.10.2")}))
By("verifying the IP pool")
poolAfterCleanup, err := wbClient.WhereaboutsV1alpha1().IPPools(namespace).Get(context.TODO(), pool.GetName(), metav1.GetOptions{})
Expect(err).NotTo(HaveOccurred())
remainingAllocation := map[string]v1alpha1.IPAllocation{
"1": {
PodRef: fmt.Sprintf("%s/%s", namespace, podName),
},
}
Expect(poolAfterCleanup.Spec.Allocations).To(Equal(remainingAllocation))
})
})
@dougbtv , @s1061123 , @zshi-redhat. Please let me know if this makes sense to you.
@mlguerrero12 I proposed a new PR to address this issue. PTAL.
This is a (somewhat manual) rebase of #180 --
It looks like it's working in my testing of it, but... I initially had some incorrect keys deleted only sometimes when I was first testing it, and getting the wrong IPs deleted?
So I was getting failures like:
However, with whatever finagling I was doing, those have gone away... Or so I think.
So I've been testing it in a loop to repeat it a bunch of times, like: