k8snetworkplumbingwg / whereabouts

A CNI IPAM plugin that assigns IP addresses cluster-wide
Apache License 2.0
273 stars 120 forks source link

[BUG] Image tag "latest" in all release tags #481

Open MaximShepelev opened 2 weeks ago

MaximShepelev commented 2 weeks ago

Describe the bug Image tag in manifests in doc/crds/daemonset-install.yaml use latest tag instead of release specific. This results in inconsistent deployments.

Expected behavior Image tag is set to corresponding release for whereabouts DaemonSet when applying manifest (v0.7.0) To Reproduce Steps to reproduce the behavior:

  1. Apply install manifest from latest release - v0.7.0

Environment:


Additional info / context This doesn't concern main point of this issue and rather is a bonus indicating broken master branch. This is how I found out a different version of image was deployed from master branch which broke environment that was previously working

Problem

Creating this deployment results in Pods stuck in Creation

❯ kgp -n default
NAME                        READY   STATUS              RESTARTS   AGE     IP       NOMINATED NODE   READINESS GATES
test-app-7d9d5bcb68-4vqdg   0/1     ContainerCreating   0          104m    <none>   <none>           <none>
test-app-7d9d5bcb68-k59dz   0/1     ContainerCreating   0          104m    <none>   <none>           <none>
test-app-7d9d5bcb68-sdhk7   0/1     ContainerCreating   0          8m10s   <none>   <none>           <none>

Describe Pod trace reveals an error with whereabouts logic

  Warning  FailedCreatePodSandBox  4m41s (x303 over 101m)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "2b52ad83cf8ae073b4aa4a6186e635e7eb9d37fbffd94714bdaae2f2d6f00758": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&{ContainerID:2b52ad83cf8ae073b4aa4a6186e635e7eb9d37fbffd94714bdaae2f2d6f00758 Netns:/var/run/netns/cni-747f9778-a05f-5647-f4a8-f91a45c141c9 IfName:eth0 Args:K8S_POD_UID=db384269-27d9-494c-abe6-12d3618e6ec9;IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=test-app-7d9d5bcb68-4vqdg;K8S_POD_INFRA_CONTAINER_ID=2b52ad83cf8ae073b4aa4a6186e635e7eb9d37fbffd94714bdaae2f2d6f00758 Path: StdinData:[SOME_BINARY_DATA]} {ContainerID:2b52ad83cf8ae073b4aa4a6186e635e7eb9d37fbffd94714bdaae2f2d6f00758 Netns:/var/run/netns/cni-747f9778-a05f-5647-f4a8-f91a45c141c9 IfName:eth0 Args:K8S_POD_UID=db384269-27d9-494c-abe6-12d3618e6ec9;IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=test-app-7d9d5bcb68-4vqdg;K8S_POD_INFRA_CONTAINER_ID=2b52ad83cf8ae073b4aa4a6186e635e7eb9d37fbffd94714bdaae2f2d6f00758 Path: StdinData:[ANOTHER_BINARY_DATA} ERRORED: error configuring pod [default/test-app-7d9d5bcb68-4vqdg] networking: [default/test-app-7d9d5bcb68-4vqdg/db384269-27d9-494c-abe6-12d3618e6ec9:test-nad]: error adding container to network "test-nad": error at storage engine: OverlappingRangeIPReservation.whereabouts.cni.cncf.io "100.112.0.1" is invalid: spec.containerid: Required value

Whereabouts logs don't give any valuable information

❯ klg whereabouts-gdtq8
Done configuring CNI.  Sleep=false
2024-06-14T10:16:59Z [debug] Filtering pods with filter key 'spec.nodeName' and filter value 'worker-1.example.com'
2024-06-14T10:16:59Z [verbose] pod controller created
2024-06-14T10:16:59Z [verbose] Starting informer factories ...
2024-06-14T10:16:59Z [verbose] Informer factories started
2024-06-14T10:16:59Z [verbose] starting network controller
2024-06-14T10:16:59Z [verbose] using expression: 30 4 * * *
2024-06-14T11:56:22Z [verbose] deleted pod [default/test-app-7d9d5bcb68-2dlv4]
2024-06-14T11:56:22Z [verbose] result of garbage collecting pods: <nil>
2024-06-14T11:56:37Z [verbose] deleted pod [default/test-app-7d9d5bcb68-j8qg2]
2024-06-14T11:56:37Z [verbose] result of garbage collecting pods: <nil>

Multus logs

time="2024-06-14T12:01:16Z" level=warning msg="Ignoring user-configured log format" error="incorrect log format configured 'text-ts', expected 'text', 'json' or 'json-ts'"
I0614 12:01:16.586103   11848 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"test-app-7d9d5bcb68-sdhk7", UID:"4d6f58f3-f336-4b22-9eb8-05d4c225626c", APIVersion:"v1", ResourceVersion:"38553", FieldPath:""}): type: 'Normal' reason: 'AddedInterface' Add eth0 [100.96.3.2/32] from cilium
2024-06-14T12:01:17Z [debug] Used defaults from parsed flat file config @ /etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-14T12:01:17Z [debug] DEL - IPAM configuration successfully read: {Name:test-nad Type:whereabouts Routes:[] Addresses:[] IPRanges:[{OmitRanges:[100.112.240.0/20] Range:100.112.0.0/16 RangeStart:100.112.0.0 RangeEnd:<nil>}] OmitRanges:[] DNS:{Nameservers:[] Domain: Search:[] Options:[]} Range: RangeStart:<nil> RangeEnd:<nil> GatewayStr: LeaderLeaseDuration:1500 LeaderRenewDeadline:1000 LeaderRetryPeriod:500 LogFile: LogLevel: ReconcilerCronExpression:30 4 * * * OverlappingRanges:true SleepForRace:0 Gateway:<nil> Kubernetes:{KubeConfigPath:/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig K8sAPIRoot:} ConfigurationPath: PodName:test-app-7d9d5bcb68-sdhk7 PodNamespace:default NetworkName:}
2024-06-14T12:01:17Z [debug] Beginning delete for ContainerID: "91b132baa5aa5a6d715d2b9ac3872bed9977117f4f50d067d8d6dd4b475868f8" - podRef: "default/test-app-7d9d5bcb68-sdhk7" - ifName: "eth1"
2024-06-14T12:01:17Z [debug] Started leader election
I0614 12:01:17.791478   65564 leaderelection.go:250] attempting to acquire leader lease kube-system/whereabouts...
E0614 12:01:18.000289   65564 leaderelection.go:369] Failed to update lock: Operation cannot be fulfilled on leases.coordination.k8s.io "whereabouts": the object has been modified; please apply your changes to the latest version and try again
I0614 12:01:19.103977   65564 leaderelection.go:260] successfully acquired lease kube-system/whereabouts
2024-06-14T12:01:19Z [debug] OnStartedLeading() called
2024-06-14T12:01:19Z [debug] Elected as leader, do processing
2024-06-14T12:01:19Z [debug] IPManagement -- mode: 1 / containerID: "91b132baa5aa5a6d715d2b9ac3872bed9977117f4f50d067d8d6dd4b475868f8" / podRef: "default/test-app-7d9d5bcb68-sdhk7" / ifName: "eth1"
2024-06-14T12:01:19Z [debug] Deallocating given previously used IP: 100.112.0.2
2024-06-14T12:01:19Z [error] Error performing UpdateOverlappingRangeAllocation: overlappingrangeipreservations.whereabouts.cni.cncf.io "100.112.0.2" not found
2024-06-14T12:01:19Z [debug] OnStoppedLeading() called
2024-06-14T12:01:19Z [debug] Finished leader election
2024-06-14T12:01:19Z [debug] IPManagement: [], overlappingrangeipreservations.whereabouts.cni.cncf.io "100.112.0.2" not found
time="2024-06-14T12:01:19Z" level=warning msg="Ignoring user-configured log format" error="incorrect log format configured 'text-ts', expected 'text', 'json' or 'json-ts'"
2024-06-14T12:01:20Z [error] [default/test-app-7d9d5bcb68-sdhk7/4d6f58f3-f336-4b22-9eb8-05d4c225626c:test-nad]: error adding container to network "test-nad": error at storage engine: OverlappingRangeIPReservation.whereabouts.cni.cncf.io "100.112.0.2" is invalid: spec.containerid: Required value
2024-06-14T12:01:20Z [debug] Used defaults from parsed flat file config @ /etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-14T12:01:20Z [debug] DEL - IPAM configuration successfully read: {Name:test-nad Type:whereabouts Routes:[] Addresses:[] IPRanges:[{OmitRanges:[100.112.240.0/20] Range:100.112.0.0/16 RangeStart:100.112.0.0 RangeEnd:<nil>}] OmitRanges:[] DNS:{Nameservers:[] Domain: Search:[] Options:[]} Range: RangeStart:<nil> RangeEnd:<nil> GatewayStr: LeaderLeaseDuration:1500 LeaderRenewDeadline:1000 LeaderRetryPeriod:500 LogFile: LogLevel: ReconcilerCronExpression:30 4 * * * OverlappingRanges:true SleepForRace:0 Gateway:<nil> Kubernetes:{KubeConfigPath:/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig K8sAPIRoot:} ConfigurationPath: PodName:test-app-7d9d5bcb68-sdhk7 PodNamespace:default NetworkName:}
2024-06-14T12:01:20Z [debug] Beginning delete for ContainerID: "91b132baa5aa5a6d715d2b9ac3872bed9977117f4f50d067d8d6dd4b475868f8" - podRef: "default/test-app-7d9d5bcb68-sdhk7" - ifName: "eth1"
2024-06-14T12:01:20Z [debug] Started leader election
I0614 12:01:20.592477   65622 leaderelection.go:250] attempting to acquire leader lease kube-system/whereabouts...
I0614 12:01:21.897086   65622 leaderelection.go:260] successfully acquired lease kube-system/whereabouts
2024-06-14T12:01:21Z [debug] OnStartedLeading() called
2024-06-14T12:01:21Z [debug] Elected as leader, do processing
2024-06-14T12:01:21Z [debug] IPManagement -- mode: 1 / containerID: "91b132baa5aa5a6d715d2b9ac3872bed9977117f4f50d067d8d6dd4b475868f8" / podRef: "default/test-app-7d9d5bcb68-sdhk7" / ifName: "eth1"
2024-06-14T12:01:21Z [debug] Failed to find allocation for container ID: 91b132baa5aa5a6d715d2b9ac3872bed9977117f4f50d067d8d6dd4b475868f8
2024-06-14T12:01:21Z [debug] OnStoppedLeading() called
2024-06-14T12:01:21Z [debug] Finished leader election
2024-06-14T12:01:21Z [debug] IPManagement: [], <nil>
time="2024-06-14T12:01:22Z" level=warning msg="Ignoring user-configured log format" error="incorrect log format configured 'text-ts', expected 'text', 'json' or 'json-ts'"
level=warning msg="Errors encountered while deleting endpoint" error="[DELETE /endpoint][404] deleteEndpointNotFound " subsys=cilium-cni

Cilium logs don't indicate any problems at all and primary IP allocations are issued without problems. This is done on a freshly installed cluster with freshly installed Multus+Whereabouts CRDs without any overlappingrangeipreservations

❯ kg  overlappingrangeipreservations.whereabouts.cni.cncf.io -A
No resources found

Solution Setting whereabouts image explicitly to latest release tag (v0.7.0) solves this problem.

P.S. I'm using 2 network interfaces in this example with Cilium and Multus