ovn-kubernetes / ovn-kubernetes

A robust Kubernetes networking platform
https://ovn-kubernetes.io/
Apache License 2.0
838 stars 349 forks source link

Flakes in E2E egress IP tests #4060

Open andreaskaris opened 10 months ago

andreaskaris commented 10 months ago

Here:

Summarizing 2 Failures:
  [FAIL] e2e egress IP validation [It] Should validate the egress IP SNAT functionality against host-networked pods
  /home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/egressip.go:756
  [FAIL] e2e egress IP validation [It] Should validate egress IP logic when one pod is managed by more than one egressIP object

https://pipelinesghubeus22.actions.githubusercontent.com/rx2q6cyVnb1KdY24EzqfkXItIKb3SzT55t8LHdEU6zlqA0chrw/_apis/pipelines/1/runs/115655/signedlogcontent/46?urlExpires=2023-12-19T17%3A21%3A18.3407444Z&urlSigningMethod=HMACV1&urlSignature=9VMnxbLTbPatTtuppK9sIIROo7uxBPl7iU2UccedOws%3D

And here:

Summarizing 5 Failures:
  [FAIL] e2e egress IP validation [It] Should validate the egress IP SNAT functionality against host-networked pods
  /home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/egressip.go:756

https://pipelinesghubeus22.actions.githubusercontent.com/rx2q6cyVnb1KdY24EzqfkXItIKb3SzT55t8LHdEU6zlqA0chrw/_apis/pipelines/1/runs/115654/signedlogcontent/43?urlExpires=2023-12-19T17%3A22%3A31.7614437Z&urlSigningMethod=HMACV1&urlSignature=un%2FiamuWMMvL1B%2FXHyD5F3ACc539wD7bmUROLS%2BlCkI%3D


Summarizing 1 Failure:
  [FAIL] e2e egress IP validation [It] Should validate the egress IP SNAT functionality against host-networked pods
  /home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/egressip.go:756

https://pipelinesghubeus22.actions.githubusercontent.com/rx2q6cyVnb1KdY24EzqfkXItIKb3SzT55t8LHdEU6zlqA0chrw/_apis/pipelines/1/runs/115664/signedlogcontent/48?urlExpires=2023-12-19T19%3A20%3A34.4192391Z&urlSigningMethod=HMACV1&urlSignature=yeabHj3Qvv1SuBA8kbX8kimtH%2BcVpWfjedhNVbdzq0U%3D

andreaskaris commented 10 months ago

When testing manually for:

$ go test -v . --ginkgo.focus="Should validate the egress IP SNAT functionality against host-networked pods" --ginkgo.v

We can see that the EgressIP failover takes variable amounts of time, and sometimes this times out.

From a failing run:

  Dec 19 21:13:49.426: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=default get eip -o json'
  Dec 19 21:13:49.501: INFO: stderr: ""
  Dec 19 21:13:49.501: INFO: stdout: "{\n    \"apiVersion\": \"v1\",\n    \"items\": [\n        {\n            \"apiVersion\": \"k8s.ovn.org/v1\",\n            \"kind\": \"EgressIP\",\n            \"metadata\": {\n                \"creationTimestamp\": \"2023-12-19T20:13:46Z\",\n                \"generation\": 3,\n                \"name\": \"egressip\",\n                \"resourceVersion\": \"23742\",\n                \"uid\": \"a3323812-3d8c-4dfd-81b0-8ec1a36f51d0\"\n            },\n            \"spec\": {\n                \"egressIPs\": [\n                    \"172.18.1.4\"\n                ],\n                \"namespaceSelector\": {\n                    \"matchLabels\": {\n                        \"name\": \"egressip-7216\"\n                    }\n                },\n                \"podSelector\": {\n                    \"matchLabels\": {\n                        \"wants\": \"egress\"\n                    }\n                }\n            },\n            \"status\": {\n                \"items\": [\n                    {\n                        \"egressIP\": \"172.18.1.4\",\n                        \"node\": \"ovn-worker2\"\n                    }\n                ]\n            }\n        }\n    ],\n    \"kind\": \"List\",\n    \"metadata\": {\n        \"resourceVersion\": \"\"\n    }\n}\n"
  STEP: 12. Check connectivity from pod to an external "node" and verify that the srcIP is the expected egressIP @ 12/19/23 21:13:49.501
  Dec 19 21:13:49.501: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-7216 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:13:51.610: INFO: rc: 28
  Dec 19 21:13:52.610: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-7216 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:13:54.726: INFO: rc: 28
  Dec 19 21:13:55.611: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-7216 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:13:57.736: INFO: rc: 28
  Dec 19 21:13:58.611: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-7216 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:14:00.716: INFO: rc: 28
  Dec 19 21:14:01.611: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-7216 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:14:03.731: INFO: rc: 28
  Dec 19 21:14:04.611: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-7216 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:14:06.731: INFO: rc: 28
  Dec 19 21:14:07.611: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-7216 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:14:09.727: INFO: rc: 28
  Dec 19 21:14:10.611: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-7216 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:14:12.728: INFO: rc: 28
  Dec 19 21:14:13.611: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-7216 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:14:15.732: INFO: rc: 28
  Dec 19 21:14:16.610: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-7216 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:14:18.720: INFO: rc: 28
  Dec 19 21:14:19.611: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-7216 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:14:21.736: INFO: rc: 28
  Dec 19 21:14:22.610: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-7216 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:14:24.745: INFO: rc: 28
  Dec 19 21:14:25.611: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-7216 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:14:27.737: INFO: rc: 28
  Dec 19 21:14:28.610: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-7216 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:14:30.716: INFO: rc: 28
  Dec 19 21:14:31.611: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-7216 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:14:33.724: INFO: rc: 28
  Dec 19 21:14:33.724: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-7216 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:14:35.834: INFO: rc: 28
  Dec 19 21:14:35.834: INFO: Unexpected error: Step 12. Check connectivity from pod to an external "node" and verify that the srcIP is the expected egressIP, failed, err: timed out waiting for the condition: 
      <wait.errInterrupted>: 
      timed out waiting for the condition
      {
          cause: <*errors.errorString | 0xc000306fc0>{
              s: "timed out waiting for the condition",
          },
      }
  [FAILED] in [It] - /home/akaris/development/ovn-kubernetes/test/e2e/egressip.go:781 @ 12/19/23 21:14:35.869

From another run:

  STEP: 11. Check that the status is of length one and that it is assigned to egress2Node @ 12/19/23 21:11:19.439
  Dec 19 21:11:19.439: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=default get eip -o json'
  Dec 19 21:11:19.518: INFO: stderr: ""
  Dec 19 21:11:19.518: INFO: stdout: "{\n    \"apiVersion\": \"v1\",\n    \"items\": [\n        {\n            \"apiVersion\": \"k8s.ovn.org/v1\",\n            \"kind\": \"EgressIP\",\n            \"metadata\": {\n                \"creationTimestamp\": \"2023-12-19T20:11:16Z\",\n                \"generation\": 3,\n                \"name\": \"egressip\",\n                \"resourceVersion\": \"23423\",\n                \"uid\": \"a9b2b681-5a59-426c-b7df-7abf2c306d0e\"\n            },\n            \"spec\": {\n                \"egressIPs\": [\n                    \"172.18.1.4\"\n                ],\n                \"namespaceSelector\": {\n                    \"matchLabels\": {\n                        \"name\": \"egressip-10\"\n                    }\n                },\n                \"podSelector\": {\n                    \"matchLabels\": {\n                        \"wants\": \"egress\"\n                    }\n                }\n            },\n            \"status\": {\n                \"items\": [\n                    {\n                        \"egressIP\": \"172.18.1.4\",\n                        \"node\": \"ovn-worker2\"\n                    }\n                ]\n            }\n        }\n    ],\n    \"kind\": \"List\",\n    \"metadata\": {\n        \"resourceVersion\": \"\"\n    }\n}\n"
  STEP: 12. Check connectivity from pod to an external "node" and verify that the srcIP is the expected egressIP @ 12/19/23 21:11:19.518
  Dec 19 21:11:19.518: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-10 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:11:21.624: INFO: rc: 28
  Dec 19 21:11:22.624: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-10 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:11:24.739: INFO: rc: 28
  Dec 19 21:11:25.626: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-10 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:11:27.747: INFO: rc: 28
  Dec 19 21:11:28.625: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-10 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:11:30.733: INFO: rc: 28
  Dec 19 21:11:31.625: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-10 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:11:33.733: INFO: rc: 28
  Dec 19 21:11:34.625: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-10 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:11:36.731: INFO: rc: 28
  Dec 19 21:11:37.625: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-10 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:11:39.740: INFO: rc: 28
  Dec 19 21:11:40.625: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-10 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:11:42.736: INFO: rc: 28
  Dec 19 21:11:43.625: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:32923 --kubeconfig=/home/akaris/ovn.conf --namespace=egressip-10 exec e2e-egressip-pod-1 -- curl --connect-timeout 2 172.18.0.6:80'
  Dec 19 21:11:44.808: INFO: stderr: "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0\r100    45  100    45    0     0     42      0  0:00:01  0:00:01 --:--:--    42\n"
  Dec 19 21:11:44.808: INFO: stdout: "<html><body><h1>It works!</h1></body></html>\n"
  STEP: 13. Check connectivity from pod to another node primary IP and verify that the srcIP is the expected nodeIP @ 12/19/23 21:11:44.821
andreaskaris commented 10 months ago

image