Closed tssurya closed 1 year ago
yeah it seems our counting of iptable rules has changed somehow?
2023-03-20T13:24:48.9764792Z Mar 20 13:24:48.976: FAIL: Couldn't fetch the correct number of iptable rules, err: timed out waiting for the condition
2023-03-20T13:24:48.9765167Z Unexpected error:
2023-03-20T13:24:48.9765477Z <*errors.errorString | 0xc000206240>: {
2023-03-20T13:24:48.9765857Z s: "timed out waiting for the condition",
2023-03-20T13:24:48.9766108Z }
2023-03-20T13:24:48.9766605Z timed out waiting for the condition
2023-03-20T13:24:48.9766824Z occurred
2023-03-20T13:24:48.9766936Z
we really have not made any changes to code around this, only changes going in are those ready/serving/terminating status updates for endpoint slices, will do some runs locally to see what the issue is here, cause the count of rules is something we do in unit tests as well but we are seeing the unit tests passing.
Investigated a failure: https://github.com/ovn-org/ovn-kubernetes/actions/runs/4536905491/jobs/7994555791?pr=3504 It seems to always happen when we scale down the deployment to 3 pods, perhaps we need to wait till the actual scale down happens.
Failure:
2023-03-27T23:14:37.3984839Z [1mSTEP[0m: by sending a TCP packet to service lbservice-test with type=LoadBalancer in namespace default from backend pod lb-backend-pod
2023-03-27T23:14:38.0023886Z [1mSTEP[0m: patching service lbservice-test to allocateLoadBalancerNodePorts=false and externalTrafficPolicy=local
2023-03-27T23:14:38.0620772Z Mar 27 23:14:38.061: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:42351 --kubeconfig=/home/runner/ovn.conf --namespace=default get svc lbservice-test -o=jsonpath='{.spec.allocateLoadBalancerNodePorts}''
2023-03-27T23:14:38.2689282Z Mar 27 23:14:38.268: INFO: stderr: ""
2023-03-27T23:14:38.2690311Z Mar 27 23:14:38.268: INFO: stdout: "'false'"
2023-03-27T23:14:38.3107608Z Mar 27 23:14:38.310: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:42351 --kubeconfig=/home/runner/ovn.conf --namespace=default get svc lbservice-test -o=jsonpath='{.spec.externalTrafficPolicy}''
2023-03-27T23:14:38.4885314Z Mar 27 23:14:38.488: INFO: stderr: ""
2023-03-27T23:14:38.4886650Z Mar 27 23:14:38.488: INFO: stdout: "'Local'"
2023-03-27T23:14:43.6556192Z [1mSTEP[0m: by sending a TCP packet to service lbservice-test with type=LoadBalancer in namespace default from backend pod lb-backend-pod
2023-03-27T23:14:44.1910980Z [1mSTEP[0m: Scale down endpoints of service: lbservice-test to ensure iptable rules are also getting recreated correctly
2023-03-27T23:14:44.1911942Z Mar 27 23:14:44.190: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:42351 --kubeconfig=/home/runner/ovn.conf --namespace=default scale deployment lb-backend-pod --replicas=3'
2023-03-27T23:14:44.2766891Z Mar 27 23:14:44.275: INFO: stderr: ""
2023-03-27T23:14:44.2768358Z Mar 27 23:14:44.275: INFO: stdout: "deployment.apps/lb-backend-pod scaled\n"
2023-03-27T23:15:29.5296422Z Mar 27 23:15:29.523: FAIL: Couldn't fetch the correct number of iptable rules, err: timed out waiting for the condition
2023-03-27T23:15:29.5299087Z Unexpected error:
2023-03-27T23:15:29.5299827Z <*errors.errorString | 0xc00020c260>: {
2023-03-27T23:15:29.5300282Z s: "timed out waiting for the condition",
2023-03-27T23:15:29.5533832Z }
2023-03-27T23:15:29.5535597Z timed out waiting for the condition
2023-03-27T23:15:29.5536959Z occurred
2023-03-27T23:15:29.5538259Z
2023-03-27T23:15:29.5539528Z Full Stack Trace
NOTE that the test already waits for 5 seconds before trying to fetch iptable rules.. So we scale down at 23:14:44 and we try to curl at 23:14:49 -> then we try for 40 seconds to reach the right number of rules and we time out?
LOGS show that rules were present at that time already! it takes only a couple of seconds to create them:
After the scale down: The update event is here:
2023-03-27T23:14:44.942098994Z stderr F I0327 23:14:44.941735 29451 obj_retry.go:508] Update event received for resource *factory.endpointSliceForGateway, old object is equal to new: false
2023-03-27T23:14:44.942111194Z stderr F I0327 23:14:44.941778 29451 obj_retry.go:560] Update event received for *factory.endpointSliceForGateway default/lbservice-test-ndfvk
2023-03-27T23:14:44.942114894Z stderr F I0327 23:14:44.941807 29451 healthcheck.go:224] Reporting 3 endpoints for healthcheck "default/lbservice-test"
2023-03-27T23:14:44.942118194Z stderr F I0327 23:14:44.941828 29451 gateway_shared_intf.go:941] Updating endpointslice lbservice-test-ndfvk in namespace default
Then the iptable rule deletion happens at:
2023-03-27T23:14:44.942128194Z stderr F I0327 23:14:44.941920 29451 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-EXTERNALIP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.96.105.102:80" for protocol: 0
2023-03-27T23:14:44.942630599Z stderr F I0327 23:14:44.942475 29451 ovs.go:200] Exec(180): /usr/bin/ovs-ofctl -O OpenFlow13 --bundle replace-flows breth0 -
2023-03-27T23:14:44.976985848Z stderr F I0327 23:14:44.976872 29451 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-ETP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.244.1.12:80 -m statistic --mode random --probability 1.0000000000" for protocol: 0
2023-03-27T23:14:44.977797256Z stderr F I0327 23:14:44.977721 29451 informer.go:316] Successfully synced 'default/lb-backend-pod-66998d8b47-6x9q7'
2023-03-27T23:14:44.990194082Z stderr F I0327 23:14:44.990118 29451 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-SNAT-MGMTPORT with args: "-p TCP -d 10.244.1.12 --dport 80 -j RETURN" for protocol: 0
2023-03-27T23:14:45.006222744Z stderr F I0327 23:14:45.006148 29451 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-ETP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.244.1.14:80 -m statistic --mode random --probability 0.5000000000" for protocol: 0
2023-03-27T23:14:45.016933653Z stderr F I0327 23:14:45.016862 29451 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-SNAT-MGMTPORT with args: "-p TCP -d 10.244.1.14 --dport 80 -j RETURN" for protocol: 0
2023-03-27T23:14:45.036947956Z stderr F I0327 23:14:45.036846 29451 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-ETP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.244.1.15:80 -m statistic --mode random --probability 0.3333333333" for protocol: 0
2023-03-27T23:14:45.041991907Z stderr F I0327 23:14:45.041920 29451 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-SNAT-MGMTPORT with args: "-p TCP -d 10.244.1.15 --dport 80 -j RETURN" for protocol: 0
2023-03-27T23:14:45.043897526Z stderr F I0327 23:14:45.042092 29451 ovs.go:203] Exec(180): stdout: ""
2023-03-27T23:14:45.043908727Z stderr F I0327 23:14:45.043849 29451 ovs.go:204] Exec(180): stderr: ""
2023-03-27T23:14:45.054871138Z stderr F I0327 23:14:45.054770 29451 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-ETP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.244.1.13:80 -m statistic --mode random --probability 0.2500000000" for protocol: 0
2023-03-27T23:14:45.058603276Z stderr F I0327 23:14:45.058525 29451 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-SNAT-MGMTPORT with args: "-p TCP -d 10.244.1.13 --dport 80 -j RETURN" for protocol: 0
2023-03-27T23:14:45.085705451Z stderr F I0327 23:14:45.085624 29451 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-EXTERNALIP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.96.105.102:80" for protocol: 0
where old ones are removed and new ones are added at:
2023-03-27T23:14:45.097069766Z stderr F I0327 23:14:45.097019 29451 gateway_iptables.go:71] Adding rule in table: nat, chain: OVN-KUBE-ETP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.244.1.14:80 -m statistic --mode random --probability 1.0000000000" for protocol: 0
2023-03-27T23:14:45.097196567Z stderr F I0327 23:14:45.096896 29451 ovs.go:200] Exec(182): /usr/bin/ovs-ofctl -O OpenFlow13 --bundle replace-flows breth0 -
2023-03-27T23:14:45.100216798Z stderr F I0327 23:14:45.100125 29451 gateway_iptables.go:79] Chain: "OVN-KUBE-ETP" in table: "nat" already exists, skipping creation: running [/usr/sbin/iptables -t nat -N OVN-KUBE-ETP --wait]: exit status 1: iptables: Chain already exists.
2023-03-27T23:14:45.105511051Z stderr F I0327 23:14:45.105431 29451 gateway_iptables.go:71] Adding rule in table: nat, chain: OVN-KUBE-SNAT-MGMTPORT with args: "-p TCP -d 10.244.1.14 --dport 80 -j RETURN" for protocol: 0
2023-03-27T23:14:45.108068877Z stderr F I0327 23:14:45.107984 29451 gateway_iptables.go:79] Chain: "OVN-KUBE-SNAT-MGMTPORT" in table: "nat" already exists, skipping creation: running [/usr/sbin/iptables -t nat -N OVN-KUBE-SNAT-MGMTPORT --wait]: exit status 1: iptables: Chain already exists.
2023-03-27T23:14:45.119371292Z stderr F I0327 23:14:45.119290 29451 gateway_iptables.go:71] Adding rule in table: nat, chain: OVN-KUBE-ETP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.244.1.15:80 -m statistic --mode random --probability 0.5000000000" for protocol: 0
2023-03-27T23:14:45.124025639Z stderr F I0327 23:14:45.123941 29451 gateway_iptables.go:79] Chain: "OVN-KUBE-ETP" in table: "nat" already exists, skipping creation: running [/usr/sbin/iptables -t nat -N OVN-KUBE-ETP --wait]: exit status 1: iptables: Chain already exists.
2023-03-27T23:14:45.125774957Z stderr F I0327 23:14:45.125699 29451 ovs.go:203] Exec(182): stdout: ""
2023-03-27T23:14:45.125887758Z stderr F I0327 23:14:45.125835 29451 ovs.go:204] Exec(182): stderr: ""
2023-03-27T23:14:45.128729087Z stderr F I0327 23:14:45.128665 29451 gateway_iptables.go:71] Adding rule in table: nat, chain: OVN-KUBE-SNAT-MGMTPORT with args: "-p TCP -d 10.244.1.15 --dport 80 -j RETURN" for protocol: 0
2023-03-27T23:14:45.131326313Z stderr F I0327 23:14:45.131227 29451 gateway_iptables.go:79] Chain: "OVN-KUBE-SNAT-MGMTPORT" in table: "nat" already exists, skipping creation: running [/usr/sbin/iptables -t nat -N OVN-KUBE-SNAT-MGMTPORT --wait]: exit status 1: iptables: Chain already exists.
2023-03-27T23:14:45.137787079Z stderr F I0327 23:14:45.137712 29451 gateway_iptables.go:71] Adding rule in table: nat, chain: OVN-KUBE-ETP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.244.1.12:80 -m statistic --mode random --probability 0.3333333333" for protocol: 0
2023-03-27T23:14:45.140301304Z stderr F I0327 23:14:45.140199 29451 gateway_iptables.go:79] Chain: "OVN-KUBE-ETP" in table: "nat" already exists, skipping creation: running [/usr/sbin/iptables -t nat -N OVN-KUBE-ETP --wait]: exit status 1: iptables: Chain already exists.
2023-03-27T23:14:45.147173574Z stderr F I0327 23:14:45.147092 29451 gateway_iptables.go:71] Adding rule in table: nat, chain: OVN-KUBE-SNAT-MGMTPORT with args: "-p TCP -d 10.244.1.12 --dport 80 -j RETURN" for protocol: 0
2023-03-27T23:14:45.149623599Z stderr F I0327 23:14:45.149541 29451 gateway_iptables.go:79] Chain: "OVN-KUBE-SNAT-MGMTPORT" in table: "nat" already exists, skipping creation: running [/usr/sbin/iptables -t nat -N OVN-KUBE-SNAT-MGMTPORT --wait]: exit status 1: iptables: Chain already exists.
2023-03-27T23:14:45.155957363Z stderr F I0327 23:14:45.155877 29451 gateway_iptables.go:71] Adding rule in table: nat, chain: OVN-KUBE-EXTERNALIP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.96.105.102:80" for protocol: 0
The rules are already present at this time!! I wonder if my iptable rule counting logic changed somehow :/
Investigated another failure: https://github.com/ovn-org/ovn-kubernetes/actions/runs/4513426760?pr=3430
Failure:
2023-03-24T19:14:48.9030228Z Mar 24 19:14:48.902: INFO: stdout: "'Local'"
2023-03-24T19:14:54.0441544Z [1mSTEP[0m: by sending a TCP packet to service lbservice-test with type=LoadBalancer in namespace default from backend pod lb-backend-pod
2023-03-24T19:14:54.5158042Z [1mSTEP[0m: Scale down endpoints of service: lbservice-test to ensure iptable rules are also getting recreated correctly
2023-03-24T19:14:54.5159298Z Mar 24 19:14:54.515: INFO: Running '/usr/local/bin/kubectl --server=https://127.0.0.1:37945 --kubeconfig=/home/runner/ovn.conf --namespace=default scale deployment lb-backend-pod --replicas=3'
2023-03-24T19:14:54.6186601Z Mar 24 19:14:54.618: INFO: stderr: ""
2023-03-24T19:14:54.6187721Z Mar 24 19:14:54.618: INFO: stdout: "deployment.apps/lb-backend-pod scaled\n"
2023-03-24T19:15:39.8338143Z Mar 24 19:15:39.832: FAIL: Couldn't fetch the correct number of iptable rules, err: timed out waiting for the condition
2023-03-24T19:15:39.8338942Z Unexpected error:
2023-03-24T19:15:39.8340609Z <*errors.errorString | 0xc00018a280>: {
2023-03-24T19:15:39.8341198Z s: "timed out waiting for the condition",
2023-03-24T19:15:39.8341574Z }
2023-03-24T19:15:39.8419901Z timed out waiting for the condition
2023-03-24T19:15:39.8420156Z occurred
NOTE that the test already waits for 5 seconds before trying to fetch iptable rules.. So we scale down at 19:14:54 and we try to curl at 19:14:59 -> then we try for 40 seconds to reach the right number of rules and we time out?
LOGS show that rules were present at that time already! it takes only a couple of seconds to create them:
Update after scale down:
2023-03-24T19:14:55.363025103Z stderr F I0324 19:14:55.362982 34855 obj_retry.go:560] Update event received for *factory.endpointSliceForGateway default/lbservice-test-khdfk
2023-03-24T19:14:55.363104303Z stderr F I0324 19:14:55.363064 34855 healthcheck.go:224] Reporting 3 endpoints for healthcheck "default/lbservice-test"
2023-03-24T19:14:55.365332698Z stderr F I0324 19:14:55.363143 34855 gateway_shared_intf.go:941] Updating endpointslice lbservice-test-khdfk in namespace default
After this:
2023-03-24T19:14:55.366097997Z stderr F I0324 19:14:55.366066 34855 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-EXTERNALIP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.96.209.170:80" for protocol: 0
2023-03-24T19:14:55.366495496Z stderr F I0324 19:14:55.366443 34855 ovs.go:200] Exec(200): /usr/bin/ovs-ofctl -O OpenFlow13 --bundle replace-flows breth0 -
2023-03-24T19:14:55.381001768Z stderr F I0324 19:14:55.380014 34855 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-ETP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.244.2.17:80 -m statistic --mode random --probability 1.0000000000" for protocol: 0
2023-03-24T19:14:55.405328322Z stderr F I0324 19:14:55.404592 34855 informer.go:316] Successfully synced 'default/lb-backend-pod-66998d8b47-2jkbh'
2023-03-24T19:14:55.408072617Z stderr F I0324 19:14:55.406680 34855 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-SNAT-MGMTPORT with args: "-p TCP -d 10.244.2.17 --dport 80 -j RETURN" for protocol: 0
2023-03-24T19:14:55.438442859Z stderr F I0324 19:14:55.438327 34855 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-ETP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.244.2.16:80 -m statistic --mode random --probability 0.5000000000" for protocol: 0
2023-03-24T19:14:55.450648636Z stderr F I0324 19:14:55.450560 34855 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-SNAT-MGMTPORT with args: "-p TCP -d 10.244.2.16 --dport 80 -j RETURN" for protocol: 0
2023-03-24T19:14:55.455197127Z stderr F I0324 19:14:55.455111 34855 ovs.go:203] Exec(200): stdout: ""
2023-03-24T19:14:55.455287527Z stderr F I0324 19:14:55.455253 34855 ovs.go:204] Exec(200): stderr: ""
2023-03-24T19:14:55.455346927Z stderr F I0324 19:14:55.455320 34855 ovs.go:200] Exec(201): /usr/bin/ovs-ofctl -O OpenFlow13 --bundle replace-flows breth1 -
2023-03-24T19:14:55.460526217Z stderr F I0324 19:14:55.460449 34855 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-ETP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.244.2.19:80 -m statistic --mode random --probability 0.3333333333" for protocol: 0
2023-03-24T19:14:55.467810903Z stderr F I0324 19:14:55.467725 34855 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-SNAT-MGMTPORT with args: "-p TCP -d 10.244.2.19 --dport 80 -j RETURN" for protocol: 0
2023-03-24T19:14:55.477499784Z stderr F I0324 19:14:55.477402 34855 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-ETP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.244.2.18:80 -m statistic --mode random --probability 0.2500000000" for protocol: 0
2023-03-24T19:14:55.482828374Z stderr F I0324 19:14:55.482703 34855 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-SNAT-MGMTPORT with args: "-p TCP -d 10.244.2.18 --dport 80 -j RETURN" for protocol: 0
2023-03-24T19:14:55.483155974Z stderr F I0324 19:14:55.483087 34855 ovs.go:203] Exec(201): stdout: ""
2023-03-24T19:14:55.483233273Z stderr F I0324 19:14:55.483192 34855 ovs.go:204] Exec(201): stderr: ""
2023-03-24T19:14:55.48997956Z stderr F I0324 19:14:55.489895 34855 gateway_iptables.go:118] Deleting rule in table: nat, chain: OVN-KUBE-EXTERNALIP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.96.209.170:80" for protocol: 0
2023-03-24T19:14:55.493634854Z stderr F I0324 19:14:55.493566 34855 ovs.go:200] Exec(202): /usr/bin/ovs-ofctl show breth0
2023-03-24T19:14:55.50085844Z stderr F I0324 19:14:55.500732 34855 ovs.go:203] Exec(202): stdout: "OFPT_FEATURES_REPLY (xid=0x2): dpid:00000242ac120003\nn_tables:254, n_buffers:0\ncapabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP\nactions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst\n 1(eth0): addr:02:42:ac:12:00:03\n config: 0\n state: 0\n current: 10GB-FD COPPER\n speed: 10000 Mbps now, 0 Mbps max\n 2(patch-breth0_ov): addr:fa:c6:86:4e:75:17\n config: 0\n state: 0\n speed: 0 Mbps now, 0 Mbps max\n LOCAL(breth0): addr:02:42:ac:12:00:03\n config: 0\n state: 0\n speed: 0 Mbps now, 0 Mbps max\nOFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0\n"
2023-03-24T19:14:55.50094934Z stderr F I0324 19:14:55.500909 34855 ovs.go:204] Exec(202): stderr: ""
2023-03-24T19:14:55.501037839Z stderr F I0324 19:14:55.501005 34855 openflow_manager.go:47] Gateway OpenFlow sync requested
2023-03-24T19:14:55.501176339Z stderr F I0324 19:14:55.501138 34855 gateway_iptables.go:71] Adding rule in table: nat, chain: OVN-KUBE-ETP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.244.2.19:80 -m statistic --mode random --probability 1.0000000000" for protocol: 0
2023-03-24T19:14:55.501414439Z stderr F I0324 19:14:55.501086 34855 ovs.go:200] Exec(203): /usr/bin/ovs-ofctl -O OpenFlow13 --bundle replace-flows breth0 -
2023-03-24T19:14:55.504522433Z stderr F I0324 19:14:55.504447 34855 gateway_iptables.go:79] Chain: "OVN-KUBE-ETP" in table: "nat" already exists, skipping creation: running [/usr/sbin/iptables -t nat -N OVN-KUBE-ETP --wait]: exit status 1: iptables: Chain already exists.
2023-03-24T19:14:55.512008618Z stderr F I0324 19:14:55.511937 34855 gateway_iptables.go:71] Adding rule in table: nat, chain: OVN-KUBE-SNAT-MGMTPORT with args: "-p TCP -d 10.244.2.19 --dport 80 -j RETURN" for protocol: 0
2023-03-24T19:14:55.515067813Z stderr F I0324 19:14:55.514997 34855 gateway_iptables.go:79] Chain: "OVN-KUBE-SNAT-MGMTPORT" in table: "nat" already exists, skipping creation: running [/usr/sbin/iptables -t nat -N OVN-KUBE-SNAT-MGMTPORT --wait]: exit status 1: iptables: Chain already exists.
2023-03-24T19:14:55.522284799Z stderr F I0324 19:14:55.522200 34855 gateway_iptables.go:71] Adding rule in table: nat, chain: OVN-KUBE-ETP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.244.2.17:80 -m statistic --mode random --probability 0.5000000000" for protocol: 0
2023-03-24T19:14:55.525406593Z stderr F I0324 19:14:55.525335 34855 gateway_iptables.go:79] Chain: "OVN-KUBE-ETP" in table: "nat" already exists, skipping creation: running [/usr/sbin/iptables -t nat -N OVN-KUBE-ETP --wait]: exit status 1: iptables: Chain already exists.
2023-03-24T19:14:55.539885565Z stderr F I0324 19:14:55.539768 34855 gateway_iptables.go:71] Adding rule in table: nat, chain: OVN-KUBE-SNAT-MGMTPORT with args: "-p TCP -d 10.244.2.17 --dport 80 -j RETURN" for protocol: 0
2023-03-24T19:14:55.543391859Z stderr F I0324 19:14:55.543293 34855 gateway_iptables.go:79] Chain: "OVN-KUBE-SNAT-MGMTPORT" in table: "nat" already exists, skipping creation: running [/usr/sbin/iptables -t nat -N OVN-KUBE-SNAT-MGMTPORT --wait]: exit status 1: iptables: Chain already exists.
2023-03-24T19:14:55.546847452Z stderr F I0324 19:14:55.546764 34855 ovs.go:203] Exec(203): stdout: ""
2023-03-24T19:14:55.546926852Z stderr F I0324 19:14:55.546891 34855 ovs.go:204] Exec(203): stderr: ""
2023-03-24T19:14:55.547167951Z stderr F I0324 19:14:55.547124 34855 ovs.go:200] Exec(204): /usr/bin/ovs-ofctl -O OpenFlow13 --bundle replace-flows breth1 -
2023-03-24T19:14:55.554118938Z stderr F I0324 19:14:55.554044 34855 gateway_iptables.go:71] Adding rule in table: nat, chain: OVN-KUBE-ETP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.244.2.16:80 -m statistic --mode random --probability 0.3333333333" for protocol: 0
2023-03-24T19:14:55.557199932Z stderr F I0324 19:14:55.557112 34855 gateway_iptables.go:79] Chain: "OVN-KUBE-ETP" in table: "nat" already exists, skipping creation: running [/usr/sbin/iptables -t nat -N OVN-KUBE-ETP --wait]: exit status 1: iptables: Chain already exists.
2023-03-24T19:14:55.565269317Z stderr F I0324 19:14:55.565183 34855 gateway_iptables.go:71] Adding rule in table: nat, chain: OVN-KUBE-SNAT-MGMTPORT with args: "-p TCP -d 10.244.2.16 --dport 80 -j RETURN" for protocol: 0
2023-03-24T19:14:55.56877151Z stderr F I0324 19:14:55.568681 34855 gateway_iptables.go:79] Chain: "OVN-KUBE-SNAT-MGMTPORT" in table: "nat" already exists, skipping creation: running [/usr/sbin/iptables -t nat -N OVN-KUBE-SNAT-MGMTPORT --wait]: exit status 1: iptables: Chain already exists.
2023-03-24T19:14:55.573159702Z stderr F I0324 19:14:55.573076 34855 ovs.go:203] Exec(204): stdout: ""
2023-03-24T19:14:55.573231202Z stderr F I0324 19:14:55.573201 34855 ovs.go:204] Exec(204): stderr: ""
2023-03-24T19:14:55.575711497Z stderr F I0324 19:14:55.575625 34855 gateway_iptables.go:71] Adding rule in table: nat, chain: OVN-KUBE-EXTERNALIP with args: "-p TCP -d 192.168.10.0 --dport 80 -j DNAT --to-destination 10.96.209.170:80" for protocol: 0
We are done in 1 second actually! so why did we fail in the test?
Running this test locally always seems to pass.
LOL unit tests are also failing in this space: https://github.com/ovn-org/ovn-kubernetes/actions/runs/4563583609/jobs/8052256701?pr=3512
for sure something around how we process endpoints has changed:
2023-03-30T11:09:22.0027083Z [91m[1m• Failure [0.130 seconds][0m
2023-03-30T11:09:22.0027339Z Node Operations
2023-03-30T11:09:22.0027866Z [90m/home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/node/gateway_localnet_linux_test.go:219[0m
2023-03-30T11:09:22.0028191Z on add
2023-03-30T11:09:22.0028753Z [90m/home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/node/gateway_localnet_linux_test.go:377[0m
2023-03-30T11:09:22.0030585Z [91m[1minits iptables rules and openflows with LoadBalancer where AllocateLoadBalancerNodePorts=False, ETP=local, LGW mode [It][0m
2023-03-30T11:09:22.0032168Z [90m/home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/node/gateway_localnet_linux_test.go:837[0m
2023-03-30T11:09:22.0032457Z
2023-03-30T11:09:22.0032705Z [91mExpected success, but got an error:
2023-03-30T11:09:22.0033363Z <*errors.errorString | 0xc0004bddd0>: {
2023-03-30T11:09:22.0035620Z s: "expected rule \"-p TCP -d 10.244.0.3 --dport 8080 -j RETURN\" at pos 0 in chain nat/OVN-KUBE-SNAT-MGMTPORT, got \"-p TCP -d 10.244.0.4 --dport 8080 -j RETURN\"",
2023-03-30T11:09:22.0036059Z }
2023-03-30T11:09:22.0037241Z expected rule "-p TCP -d 10.244.0.3 --dport 8080 -j RETURN" at pos 0 in chain nat/OVN-KUBE-SNAT-MGMTPORT, got "-p TCP -d 10.244.0.4 --dport 8080 -j RETURN"[0m
2023-03-30T11:09:22.0037546Z
2023-03-30T11:09:22.0038299Z /home/runner/work/ovn-kubernetes/ovn-kubernetes/go-controller/pkg/node/gateway_localnet_linux_test.go:957