k8snetworkplumbingwg / sriov-cni

DPDK & SR-IOV CNI plugin
Apache License 2.0
298 stars 147 forks source link

Wait for carrier before announcing IPs via GARP/NA #301

Closed thom311 closed 1 month ago

thom311 commented 1 month ago

This is a follow up to c241dcb4367c.

what happens is that after sriov-cni sets up the interface, initially the interface does not have carrier. Sent GARP/IPv6 NA at that point are lost.

Instead, wait a bit until the interface has carrier.

See-also: https://issues.redhat.com/browse/OCPBUGS-30549

wizhaoredhat commented 1 month ago

LGTM @SchSeba @zeeke PTAL

thom311 commented 1 month ago

@wizhaoredhat previous version had a bug. Should be fixed now:

diff --git a/pkg/utils/packet.go b/pkg/utils/packet.go
index 809b4ded399f..88a52bc78cc1 100644
--- a/pkg/utils/packet.go
+++ b/pkg/utils/packet.go
@@ -221,7 +221,12 @@ func WaitForCarrier(ifName string, waitTime time.Duration) bool {
               return false
          }

-         if linkObj.Attrs().RawFlags&unix.IFF_NO_CARRIER != 0 {
+         /* Wait for carrier, i.e. IFF_UP|IFF_RUNNING. Note that there is also
+          * IFF_LOWER_UP, but we follow iproute2 ([1]).
+          *
+          * [1] https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/tree/ip/ipaddress.c?id=f9601b10c21145f76c3d46c163bac39515ed2061#n86
+          */
+         if linkObj.Attrs().RawFlags&(unix.IFF_UP|unix.IFF_RUNNING) == (unix.IFF_UP | unix.IFF_RUNNING) {
               return true
          }
     }
coveralls commented 1 month ago

Pull Request Test Coverage Report for Build 9346881405

Details


Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/utils/packet.go 18 22 81.82%
<!-- Total: 18 22 81.82% -->
Totals Coverage Status
Change from base Build 9250302256: 0.6%
Covered Lines: 680
Relevant Lines: 1319

💛 - Coveralls
thom311 commented 1 month ago

Please, consider adding a unit test for WaitForCarrier (see f168bc2)

I took your patch and integrated it in the branch. How about this?

(I don't think the test is yet correct. WIP).

thom311 commented 1 month ago

(I don't think the test is yet correct. WIP).

Unit test should work now... IMO it's ready.

SchSeba commented 1 month ago

Hi @thom311 the CI is still not happy can you please take a look?

thom311 commented 1 month ago

Hi @thom311 the CI is still not happy can you please take a look?

It says "race detected during execution of test".

Is this due to the assignment of fakeLink.RawFlags while another go routine reads it?

wizhaoredhat commented 1 month ago

@adrianchiris Could you also help PTAL?

SchSeba commented 1 month ago

Hi @thom311 yes that is the reason please take a look

WARNING: DATA RACE
Write at 0x00c000135bc8 by goroutine 49:
  github.com/k8snetworkplumbingwg/sriov-cni/pkg/utils.glob..func1.1.1()
      /home/runner/work/sriov-cni/sriov-cni/pkg/utils/packet_test.go:37 +0x4e4
  github.com/onsi/ginkgo/v2/internal.extractBodyFunction.func3()
      /root/go/pkg/mod/github.com/onsi/ginkgo/v2@v2.9.2/internal/node.go:463 +0x2e
  github.com/onsi/ginkgo/v2/internal.(*Suite).runNode.func3()
      /root/go/pkg/mod/github.com/onsi/ginkgo/v2@v2.9.2/internal/suite.go:863 +0x14b

Previous read at 0x00c000135bc8 by goroutine 50:
  github.com/k8snetworkplumbingwg/sriov-cni/pkg/utils.WaitForCarrier()
      /home/runner/work/sriov-cni/sriov-cni/pkg/utils/packet.go:225 +0xf0
  github.com/k8snetworkplumbingwg/sriov-cni/pkg/utils.glob..func1.1.1.2()
      /home/runner/work/sriov-cni/sriov-cni/pkg/utils/packet_test.go:32 +0x[44](https://github.com/k8snetworkplumbingwg/sriov-cni/actions/runs/9315930803/job/25643595721?pr=301#step:5:45)

@wizhaoredhat please first fix the unit-test so we can continue with the reviews on the PR.

thom311 commented 1 month ago

Hi @thom311 yes that is the reason please take a look

Race in unit test should be fixed now. How is that.

adrianchiris commented 1 month ago

@thom311 could you sort the PR commits? i see it now got 167 commits (openshift fork?)

thom311 commented 1 month ago

@thom311 could you sort the PR commits? i see it now got 167 commits (openshift fork?)

sorry, that was a wrong rebase.

Fixed, and addressed other comments too.