Closed ricky-rav closed 3 days ago
@qinqon oh both kv-migrations failed... looks like we need to look into this
shard conformance: https://github.com/ovn-org/ovn-kubernetes/actions/runs/9597526159/job/26468141166?pr=4457 timed out :/
2024-06-20T13:47:05.8175952Z [0mMulticast [0m[1mshould be able to send multicast UDP traffic between nodes[0m
2024-06-20T13:47:05.8177373Z [38;5;243m/home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/multicast.go:79[0m
2024-06-20T13:47:05.8178804Z [1mSTEP:[0m Creating a kubernetes client [38;5;243m@ 06/20/24 13:47:05.817[0m
2024-06-20T13:47:05.8179917Z Jun 20 13:47:05.817: INFO: >>> kubeConfig: /home/runner/ovn.conf
2024-06-20T13:47:05.8183639Z [1mSTEP:[0m Building a namespace api object, basename multicast [38;5;243m@ 06/20/24 13:47:05.818[0m
2024-06-20T13:47:05.8221785Z Jun 20 13:47:05.821: INFO: Skipping waiting for service account
2024-06-20T13:47:05.8421587Z [1mSTEP:[0m creating a pod as a multicast source in node ovn-worker [38;5;243m@ 06/20/24 13:47:05.841[0m
2024-06-20T13:47:05.8487617Z W0620 13:47:05.848044 74981 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "pod-client" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "pod-client" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "pod-client" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "pod-client" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
2024-06-20T13:47:07.8552911Z [1mSTEP:[0m creating first multicast listener pod in node ovn-worker2 [38;5;243m@ 06/20/24 13:47:07.854[0m
2024-06-20T13:47:07.8605049Z W0620 13:47:07.859650 74981 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "pod-server1" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "pod-server1" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "pod-server1" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "pod-server1" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
2024-06-20T13:47:09.8687194Z [1mSTEP:[0m creating second multicast listener pod in node ovn-worker2 [38;5;243m@ 06/20/24 13:47:09.868[0m
2024-06-20T13:47:09.8735850Z W0620 13:47:09.872877 74981 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "pod-server2" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "pod-server2" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "pod-server2" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "pod-server2" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
2024-06-20T13:47:11.8815733Z [1mSTEP:[0m creating first multicast listener pod in node ovn-worker [38;5;243m@ 06/20/24 13:47:11.881[0m
2024-06-20T13:47:11.8867196Z W0620 13:47:11.886000 74981 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "pod-server3" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "pod-server3" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "pod-server3" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "pod-server3" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
2024-06-20T13:47:13.8938814Z [1mSTEP:[0m checking if pod server1 received multicast traffic [38;5;243m@ 06/20/24 13:47:13.893[0m
2024-06-20T13:47:13.9057278Z [1mSTEP:[0m checking if pod server2 does not received multicast traffic [38;5;243m@ 06/20/24 13:47:13.905[0m
2024-06-20T13:47:13.9089851Z [1mSTEP:[0m checking if pod server3 received multicast traffic [38;5;243m@ 06/20/24 13:47:13.908[0m
2024-06-20T13:47:13.9194687Z [1mSTEP:[0m Destroying namespace "multicast-8182" for this suite. [38;5;243m@ 06/20/24 13:47:13.919[0m
2024-06-20T13:47:13.9221258Z [38;5;10m• [8.105 seconds][0m
test passes.
external gateway lane: https://github.com/ovn-org/ovn-kubernetes/actions/runs/9597526159/job/26468146207?pr=4457 known flake: https://github.com/ovn-org/ovn-kubernetes/issues/4432
Given its an OVN Bump and both the live migration jobs failed, I cannot merge with a red CI:
2024-06-20T13:19:20.8836054Z Latency metrics for node ovn-worker3
2024-06-20T13:19:20.8837410Z [1mSTEP:[0m Destroying namespace "kv-live-migration-1853" for this suite. [38;5;243m@ 06/20/24 13:19:20.883[0m
2024-06-20T13:19:20.8884104Z [38;5;9m• [FAILED] [214.253 seconds][0m
2024-06-20T13:19:20.8885911Z [0mKubevirt Virtual Machines [38;5;243mwith default pod network [0mwhen live migration [38;5;9m[1m[It] with pre-copy succeeds, should keep connectivity[0m
2024-06-20T13:19:20.8887975Z [38;5;243m/home/runner/work/ovn-kubernetes/ovn-kubernetes/test/e2e/kubevirt.go:1093[0m
2024-06-20T13:19:20.8888816Z
2024-06-20T13:19:20.8889269Z [38;5;9m[FAILED] worker1: Expose tcpServer as a service
2024-06-20T13:19:20.8889948Z Unexpected error:
2024-06-20T13:19:20.8890539Z <*fmt.wrapError | 0xc000e8e160>:
2024-06-20T13:19:20.8891567Z failed DialTCP: dial tcp 172.18.0.2:32485: connect: connection refused
2024-06-20T13:19:20.8892420Z {
2024-06-20T13:19:20.8893456Z msg: "failed DialTCP: dial tcp 172.18.0.2:32485: connect: connection refused",
2024-06-20T13:19:20.8894626Z err: <*net.OpError | 0xc000d9bf90>{
2024-06-20T13:19:20.8895290Z Op: "dial",
2024-06-20T13:19:20.8896407Z Net: "tcp",
2024-06-20T13:19:20.8897012Z Source: nil,
2024-06-20T13:19:20.8897757Z Addr: <*net.TCPAddr | 0xc001210000>{
2024-06-20T13:19:20.8898718Z IP: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 172, 18, 0, 2],
2024-06-20T13:19:20.8899401Z Port: 32485,
2024-06-20T13:19:20.8900010Z Zone: "",
2024-06-20T13:19:20.8900430Z },
2024-06-20T13:19:20.8901008Z Err: <*os.SyscallError | 0xc000e8e140>{
2024-06-20T13:19:20.8901670Z Syscall: "connect",
2024-06-20T13:19:20.8902173Z Err: <syscall.Errno>0x6f,
2024-06-20T13:19:20.8902787Z },
2024-06-20T13:19:20.8903209Z },
2024-06-20T13:19:20.8903610Z }
2024-06-20T13:19:20.8904012Z occurred[0m
2024-06-20T13:19:20.8905263Z [38;5;9mIn [1m[It][0m[38;5;9m at: [1m/opt/hostedtoolcache/go/1.21.11/x64/src/runtime/asm_amd64.s:1650[0m [38;5;243m@ 06/20/24 13:19:19.612[0m
2024-06-20T13:19:20.8906000Z
I see this it might not be related but can't risk a regression. Gut tells me to see at least 1 lane green; hence triggered a re-run of failed lanes
live migration has failed again. We need some investigation on the CI failure @ricky-rav FYI before I can merge this
@tssurya, few weeks ago we did test some ovn changes related to arp_proxy and they were working allright, maybe it is still problematic and we didn't test it well,
https://github.com/ovn-org/ovn/commit/cc4187b4b49e25bc60c94aff493ac22ffe0a418c
Bumps OVN to 24.03.2-19, which reverts multicast-related commits that introduced a regression. Extends the unit test to cover the scenario that was broken: add an additional receiver to the same node where the sender is. https://issues.redhat.com/browse/OCPBUGS-34778 https://issues.redhat.com/browse/FDP-656