Open ondrej-fabry opened 4 years ago
Test TestTapInterfaceConn
keeps running into some bug with messed up name for VPP TAP interface:
config/linux/interfaces/v2/interface/linux-tap (CREATE): LinkByName "ta\x04": Link not found
Here's failing Travis job: https://travis-ci.com/ligato/vpp-agent/jobs/286038603
Any idea why this might be happening?
cc @milanlenco @rewenset
Regarding TestTapInterfaceConn
:
It is sometimes fails to create TAP_TO_VPP interface after it was removed. And the error says it is unable to find a link for name "ta\x04"
, therefore it is not problem of that "GetLinkByName" method. Weird things happening a little bit earlier, here:
https://github.com/ligato/vpp-agent/blob/de6c10f0f7d9b940b4d2c18c8d957e94e291f261/plugins/vpp/ifplugin/ifaceidx/ifaceidx.go#L108-L116
So, "createTAPToVPP" method uses interface index from vpp plugin to get meta things. And when test passes it is:
vppTapMeta: &{SwIfIndex:1 Vrf:0 IPAddresses:[192.168.1.1/30] TAPHostIfName:tap-2870264825}
TAPHostIfName in bytes: [116 97 112 45 50 56 55 48 50 54 52 56 50 53]
but when it fails:
vppTapMeta: &{SwIfIndex:1 Vrf:0 IPAddresses:[192.168.1.1/30] TAPHostIfName:ta}
TAPHostIfName in bytes: [116 97 4]
That's all for now. I'm able to reproduce it, but not every time. It is more passes than fails.
It seems that the TAPHostIfName
gets corrupted somewhere. This will need more investigation.
Just a thought.. perhaps, it might be related to some race conditions reported by testing with -race
. Those might need to be fixed first.
Another recent failure in E2E test for VPP 19.08.
In this test the agent seems to receive SB notification about removed microservice, then the VPP seems to throw epoll_ctl: Bad file descriptor (errno 9)
after which agent loses binapi connection to VPP and test fails while waiting for CONFIGURED
state for AF_PACKET interface with name vpp-afpacket
.
Relevant part of agent log during failed test:
...
+======================================================================================================================+
| #6 - SB Notification |
+======================================================================================================================+
* transaction arguments:
- seqNum: 6
- type: SB Notification
- values:
- key: linux/microservice/e2e-test-microservice1
val: <NIL>
/usr/bin/vpp[6947]: linux_epoll_file_update:120: epoll_ctl: Bad file descriptor (errno 9)
time="2020-02-19 19:27:30.75809" level=info msg="Signal terminated received, stopping." loc="agent/agent.go(196)" logger=agent
time="2020-02-19 19:27:30.75818" level=info msg="Stopping agent" loc="agent/agent.go(269)" logger=agent
time="2020-02-19 19:27:31.75149" level=error msg="failed to remove interface vpp-afpacket, index 1: no reply received within the timeout period 1s" loc="descriptor/interface_crud.go(306)" logger=vpp-ifplugin.if-descriptor
time="2020-02-19 19:27:31.75181" level=error msg="LinkByName \"veth1\": Link not found" loc="descriptor/interface_veth.go(115)" logger=linux-ifplugin.if-descriptor
time="2020-02-19 19:27:31.75200" level=error msg="switch to namespace failed:Microservice 'e2e-test-microservice1' is not available" loc="descriptor/interface.go(416)" logger=linux-ifplugin.if-descriptor
time="2020-02-19 19:27:31.75289" level=error msg="failed to retrieve values: failed to dump memif socket details: not connected to VPP, ignoring the request" descriptor=vpp-interface loc="kvscheduler/refresh.go(121)" logger=kvscheduler
time="2020-02-19 19:27:31.75386" level=error msg="KeyErrors: [config/vpp/v2/interfaces/vpp-afpacket (DELETE): failed to remove interface vpp-afpacket, index 1: no reply received within the timeout period 1s, config/linux/interfaces/v2/interface/linux-veth1 (DELETE): LinkByName \"veth1\": Link not found, config/linux/interfaces/v2/interface/linux-veth2 (DELETE): Microservice 'e2e-test-microservice1' is not available]" loc="kvscheduler/txn_process.go(404)" logger=kvscheduler
...
--- FAIL: TestAfPacketWithLogicalReference (5.11s)
Link to Travis job: https://travis-ci.com/ligato/vpp-agent/jobs/289003830#L4644
On travis, some e2e tests are failing occasionally. Here are some recent failures:
FAIL: TestTapInterfaceConn (70.83s)
https://travis-ci.com/ligato/vpp-agent/builds/141046179FAIL: TestTapInterfaceConn (70.74s)
https://travis-ci.com/ligato/vpp-agent/builds/141047198
FAIL: TestSourceNAT (15.87s)
https://travis-ci.com/ligato/vpp-agent/builds/141107514Log output for failed TestSourceNAT
``` --- FAIL: TestSourceNAT (15.87s) e2e_test.go:612: VPP start OK (PID: 11018) e2e_test.go:156: Using docker client endpoint: unix:///var/run/docker.sock e2e_test.go:612: VPP-Agent start OK (PID: 11019) e2e_test.go:564: agent ready, took 101.294308ms e2e_test.go:299: exec: vppctl show nat44 addresses e2e_test.go:299: exec: vppctl show nat44 addresses microservice_test.go:177: Linux ping 80.80.80.10: sent=5, received=3, loss=40% e2e_test.go:299: exec: vppctl clear trace e2e_test.go:299: exec: vppctl trace add virtio-input 100 e2e_test.go:299: exec: vppctl show trace e2e_test.go:299: exec: vppctl clear trace e2e_test.go:299: exec: vppctl trace add virtio-input 100 e2e_test.go:299: exec: vppctl show trace testing_t_support.go:22: /home/travis/gopath/pkg/mod/github.com/onsi/gomega@v1.4.1/internal/assertion/assertion.go:69 +0x1b4 github.com/onsi/gomega/internal/assertion.(*Assertion).Should(0xc000174ac0, 0xf3be20, 0x16000f0, 0x0, 0x0, 0x0, 0xb) /home/travis/gopath/pkg/mod/github.com/onsi/gomega@v1.4.1/internal/assertion/assertion.go:27 +0xac go.ligato.io/vpp-agent/v2/tests/e2e.TestSourceNAT(0xc000528000) /home/travis/gopath/src/go.ligato.io/vpp-agent/tests/e2e/050_nat_test.go:189 +0x19d4 testing.tRunner(0xc000528000, 0xe37580) /home/travis/.gimme/versions/go1.13.5.linux.amd64/src/testing/testing.go:909 +0xc9 created by testing.(*T).Run /home/travis/.gimme/versions/go1.13.5.linux.amd64/src/testing/testing.go:960 +0x350 Expected success, but got an error: <*errors.errorString | 0xc00027c550>: { s: "communication with 80.80.80.10:8000 timed out", } communication with 80.80.80.10:8000 timed out e2e_test.go:208: ----------------- e2e_test.go:633: VPP-Agent exit OK e2e_test.go:633: VPP exit OK ``` > full log here: https://pastebin.com/raw/wcHPBzwtFAIL: TestTapInterfaceConn (69.99s)
https://travis-ci.com/ligato/vpp-agent/jobs/268018518cc @milanlenco