Closed Alt-Shivam closed 1 year ago
Got the same error with ubuntu 22.04 and a simpler configuration in a vagrant machine
the pod is running with securityContext privileged: True.
Hi @Alt-Shivam , @aeliusrs ,
INFO[2023-04-20 18:21:06] [udsserver.go:174] [start] Unix domain socket initialised. Listening for new connection.
ERRO[2023-04-20 18:21:36] [uds.go:134] [Listen] Listener timed out: accept unixpacket /tmp/afxdp_dp/afxdp_access/13ab88a3-4751-40fb-a9b5-2b3118180c4e.sock: i/o timeout
You'll see the UDS server starts listening on the socket and then exactly 30 seconds later times out. This is by design.
Each container requesting AF_XDP network results in spinning up of a little UDS server (Go routine) to perform the XSK handshake. It's due to the way device plugins works. At the Allocate() stage, the device plugin has no idea which pod it is allocating the device(s) to, so the UDS server initially has no idea which pod it is serving. It cannot watch for a particular pod.
In the case of failed pod, we don't want these UDS servers to stay listening forever, creating more and more Go routines. So we rely on activity on the UDS to tell us if the pod started successfully. If no message comes over the UDS by a certain timeout, we assume failed pod and the UDS server shuts down. By default, say for production environments, this timeout is 30 seconds, but it is configurable up to 5min.
"udsTimeout":300,
Also, we don't really advertise this in the documentation, or recommend it for production, but for debugging or developing you can disable this timeout altogether with a -1 value. Just be aware of the servers mentioned above. They will stay running forever until the handshake occurs and you finish the handshake with a /fin
.
KR, Gary
Thanks @garyloug @aeliusrs for the comments. Got the idea.
Hey, I'm trying to deploy a pod with 2 af-xdp interaces: and facing this issue:
daemonset file:
dpdk-devbind.py -s:
cc @garyloug