xdp-project / xdp-tutorial

XDP tutorial
2.33k stars 562 forks source link

Questions about running XDP sockets on top of bonding device or on the physical interfaces behind the bond #390

Closed freak82 closed 5 months ago

freak82 commented 5 months ago

Hi there,

I'd like to ask for advice for a weird issue that I'm facing trying to run XDP on top of a bonding device (802.3ad) (or also on the physical interfaces behind the bond).

I've a DPDK application which runs on top of XDP sockets, using the DPDK AF_XDP driver. It was a pure DPDK application but lately it was migrated to run on top of XDP sockets because we need to split the traffic entering the machine between the DPDK application and other "standard-Linux" applications running on the same machine. The application works fine when running on top of a single interface but it has problems when it runs on top of a bonding interface. It needs to be able to run with multiple XDP sockets where each socket (or group of XDP sockets) is/are handled in a separate thread. However, the bonding device is reported with a single queue and thus the application can't open more than one  XDP socket for it. So I've tried binding the XDP sockets to the queues of the physical interfaces. For example:- 3 interfaces each one is set to have 8 queues

Thread 1        Thread2 (0 - 0)             (0 - 4) (1 - 0)             (1 - 4) (2 - 0)             (2 - 4) (0 - 1)             (0 - 5) (1 - 1)             (1 - 5) (2 - 1)             (2 - 5) ...                    ... (0 - 3)             (0 - 7)             (1 - 3)             (1 - 7) (2 - 3)             (2 - 7)



And here are my questions based on the above situation:
1. I assumed that it's not possible to run multiple XDP sockets on top of the bonding device itself and I need to "bind" the XDP sockets on the physical interfaces behind the bonding device. Am I right about this or am I missing something?
2. Is the bonding logic (LACP management traffic) affected by the access pattern of the XDP sockets?
3. Is this scheme supposed to work or it's just that the design is wrong? I mean, maybe a group of queues/sockets shouldn't be handled on a given thread but only a single queue should be handled on a given application thread. It's just that the physical devices have more queues setup on them than the number of threads in the DPDK application and thus multiple queues need to be handled on a single application thread.

Any ideas are appreciated!

Regards,
Pavel.
freak82 commented 5 months ago

Just for FYI, if somebody reads this issue. Forcing copy packets between the kernel and the user space (XDP_COPY) fixes the above issue. It seems that the zero copy mode is not yet fully supported for this scenario (bonding).

ghpass commented 2 months ago

Could you resolve this problem? I tried in the same way. AF_XDP is attached to the slaves, but the slave interface can't filer the udp package, the message is received by the bonding interfae in the kernel.
In the xsk_socket__create(), XDP_COPY flag is set, nothing changed. Any suggestions is appreciated.