xdp-project / xdp-tutorial

XDP tutorial
2.49k stars 579 forks source link

NIC to XSK back to Network Stack #404

Closed 0x000001A4 closed 8 months ago

0x000001A4 commented 8 months ago

Hello @dmitris @tohojo @donaldh @netoptimizer @davem330 @magnus-karlsson and other community members.

Problem:

I am trying to achieve a given behavior of:

  1. Receiving the packet on a NIC which has a XDP program attached which redirects the packet to an AF_XDP socket in user-space.
  2. Inspect / Change the content of the packet in user-space.
  3. Get it back on the same NIC so I can perform an XDP_PASS and let it into the network stack.

I can achieve this in case I swap eth, IP and transport layer source and destination in the headers, returning the packet to the original sender. But if I try to keep them so that the packet is transmitted to the supposed receiver after being in user-space, it never gets delivered to the NIC again.

Question

Why is the packet not delivered in this last case?

Possible solution

I was thinking of making a dummy interface to which I would redirect the packet through XDP, and from there perform 1,2 and 3 explained above. In step 2 I would change again the ETH, IP and Transport source an destinations in the headers.

Is there a simpler way to get the packet delivered to the original receiver through the NIC that first received and from which the packet got redirected to the AF_XDP socket to be manipulated in user-space?

tohojo commented 8 months ago

someone @.***> writes:

Is there a simpler way to get the packet delivered to the original receiver through the NIC that first received and from which the packet got redirected to the AF_XDP socket to be manipulated in user-space?

There no support for reinjection into the stack with AF_XDP. You could add a veth device pair and transmit the packets you want to reinject on one end. That would make the packets available for the kernel, but it would be on a different netdev, and there's a performance overhead.

The alternative is to do the processing that needs to go to XDP_PASS in the XDP program itself, instead of going out to userspace...

0x000001A4 commented 8 months ago

Thank you for your response @tohojo

I really want to have access of the packet in user space so that I don't have to deal with some problems of the verifier.

So, in order to fulfill this, when receiving the packet on an interface I should redirect it to a dummy veth netdev (using XDP_REDIRECT - swapping the IP and MAC to the one of the veth interface). Attached to the veth netdev I could have a XDP program that redirects the packets to a XSK map, letting it be accessed in user space. From there I could put it on a TX ring of the AF_XDP socket and send it out again (after swapping IP, MAC and TP src and dest, so that it is returned to the original receiver)?

These steps should be enough to fulfill what I need? Are there any flaws in them?

tohojo commented 8 months ago

someone @.***> writes:

Thank you for your response @tohojo

I really want to have access of the packet in user space so that I don't have to deal with some problems of the verifier.

Right, well that's the tradeoff when going to userspace: some things get easier (not having to deal with the verifier), but you lose the ability to interact with the normal network stack without hassle and overhead :)

So, in order to fulfill this, when receiving the packet on an interface I should redirect it to a dummy veth netdev (using XDP_REDIRECT - swapping the IP and MAC to the one of the veth interface). Attached to the veth netdev I could have a XDP program that redirects the packets to a XSK map, letting it be accessed in user space. From there I could put it on a TX ring of the AF_XDP socket and send it out again (after swapping IP, MAC and TP src and dest, so that it is returned to the original receiver)?

These steps should be enough to fulfill what I need? Are there any flaws in them?

Erm, if you just want to send the packets out of a physical interface from userspace you don't need the veth; you can do that just fine with AF_XDP.

The veth is for sending packets back to the kernel. So you open the AF_XDP socket on the physical device and use that to send and receive packets from the network. Then, you open a second AF_XDP socket on a veth device, when you want to send a packet to the stack (i.e., XDP_PASS it), you send the packet on the veth, after which it will show up on the other end of the veth pair and be processed by the networking stack.

0x000001A4 commented 8 months ago

I might have explained myself wrong in case you understood I needed to send the packets out of the physical interface from userspace. What I need really is to inject the packet into the kernel (network stack) after modifying it, but I need to modify in userspace. The problem is that I can't inject it back to the kernel, and as you said there is no support for reinjection into the stack using AF_XDP.

To make it clearer, the context here is: I have two processes talking with each other through AF_INET sockets, each having its veth: (vethA, vethB). Process A sends a message to Process B. I want to intercept and modify the content of the payload exchanged before it gets delivered to the AF_INET socket assigned to process B (through the network stack).

So you are telling me that I can receive the packet on one end of the vethB - redirect it to the AF_XDP socket to modify it in user-space - and make it show up on the other end of the veth pair to be processed by the network stack (by XDP_PASSing it putting it on the TX ring in the context of the AF_XDP sockets)? This was my attempt and I couldn't make it work.

As a work-around I was thinking of:

  1. Intercept the packet in the veth of process B (vethB), and through an XDP program redirect it to another veth (vethDummy).
  2. Receive the original packet from process B on vethDummy and redirect it to an AF_XDP socket to be modified in user-space.
  3. Send the packet back to process B so that it appears on its veth (vethB) and is XDP_PASS'd to the network stack.
tohojo commented 8 months ago

someone @.***> writes:

I might have explained myself wrong in case you understood I needed to send the packets out of the physical interface from userspace. What I need really is to inject the packet into the kernel (network stack) after modifying it, but I need to modify in userspace. The problem is that I can't inject it back to the kernel, and as you said there is no support for reinjection into the stack using AF_XDP.

To make it clearer, the context here is: I have two processes talking with each other through AF_INET sockets, each having its veth: (vethA, vethB). Process A sends a message to Process B. I want to intercept and modify the content of the payload exchanged before it gets delivered to the AF_INET socket assigned to process B (through the network stack).

So you are telling me that I can receive the packet on one end of the vethB - redirect it to the AF_XDP socket to modify it in user-space - and make it show up on the other end of the veth pair to be processed by the network stack (by XDP_PASSing it putting it on the TX ring in the context of the AF_XDP sockets)? This was my attempt and I couldn't make it work.

No. You can do something like:

Now:

Process A <--> veth0 -- veth1 <--> Process B

to:

Process A <--> veth0 -- veth1 <-- AF_XDP --> veth2 -- veth3 <--> Process B

Or technically, if it's all veth devices, you can install XDP programs on both veth0 and veth1, capture the packets before they hit the stack, and then reinject them on the other end again.

I.e., Process A sends a packet on veth0, you intercept it on veth1 with XDP, process it in userspace, and then re-inject it with AF_XDP on veth0 again. And vice versa.

Note also that if this is all veth devices, there is a performance overhead of adding XDP processing (no matter what the XDP program does).

0x000001A4 commented 8 months ago

First of all thank you for this discussion, it is being really helpful. I still would like to clear some doubts about it if there would be no problem.

So process A sends a packet on veth0, I intercept it on veth1 with XDP, redirect it to userspace through the XSK map. To re-inject the packet back on veth0 again would it be enough to change the MAC address of the one of veth0 and put it on the TX ring? Otherwise how could I perform this redirection with AF_XDP?

My understanding here was that veth0 is the virtual interface assigned to the container being run by process A; and veth1 the virtual interface assigned to the container being run by process B. And thus, injecting the packet in veth0 would lead it to be received by Process A, not back into process B's network stack.

tohojo commented 8 months ago

someone @.***> writes:

First of all thank you for this discussion, it is being really helpful. I still would like to clear some doubts about it if there would be no problem.

So process A sends a packet on veth0, I intercept it on veth1 with XDP, redirect it to userspace through the XSK map. To re-inject the packet back on veth0 again would it be enough to change the MAC address of the one of veth0 and put it on the TX ring? Otherwise how could I perform this redirection with AF_XDP?

You can think of a veth pair as two physical NICs connected with a wire. So if you have veth0 <--> veth1, anything sent on veth0 shows up as received on veth1. This has nothing to do with XDP, that's just how veths work.

So with that in mind, if you TX a packet with AF_XDP, that will end up being sent out of whichever interface the AF_XDP socket is connected to, and will show up as received on the other end of the veth pair. If you're intercepting a packet that already came from veth0, you don't even need to change the MAC, but yeah, otherwise you'll need to adjust it so the kernel doesn't discard it on receive.

My understanding here was that veth0 is the virtual interface assigned to the container being run by process A; and veth1 the virtual interface assigned to the container being run by process B. And thus, injecting the packet in veth0 would lead it to be received by Process A, not back into process B's network stack.

If this is a container setup, each container usually has one veth device, where the other end of the pair is in the host namespace. Unless you are doing something special to set it up, they won't have a directly connected veth pair.

So in a sense you're already in the situation in the second diagram, even without XDP involved:

Container A <--> veth0 -- veth1 <-- Host namespace --> veth2 -- veth3 <--> Container B

So if you want to intercept the traffic, you can just pick it up in the host namespace; there are lots of ways to do this, AF_XDP is one of them :)

0x000001A4 commented 8 months ago

@tohojo I fully understand everything we are discussing here. Thanks to you I am now more aware of the way veth's work and confirmed that I already have this setup and I am already intercepting the packets on the host namespace.

What I am not yet doing is being able to re-inject the packets on veth0 (for example - taking into account that I am intercepting packets on veth1) after processing them in user space.

As you said it it should be enough to TX the packet with AF_XDP, for it to end up on the other end of the interface; but the kernel is probably discarding it as it never gets delivered.

Any solution to re-inject the packet back in veth0 from veth1?

tohojo commented 8 months ago

someone @.***> writes:

@tohojo I fully understand everything we are discussing here. Thanks to you I am now more aware of the way veth's work and confirmed that I already have this setup and I am already intercepting the packets on the host namespace.

What I am not yet doing is being able to re-inject the packets on veth0 (for example - taking into account that I am intercepting packets on veth1) after processing them in user space.

As you said it it should be enough to TX the packet with AF_XDP, for it to end up on the other end of the interface; but the kernel is probably discarding it as it never gets delivered.

Any solution to re-inject the packet back in veth0 from veth1?

No, you will very much have to roll you own here. This will involve whichever process is doing this switching into the container namespace and binding an AF_XDP socket to the container veth device.

0x000001A4 commented 8 months ago

Hello @tohojo Thanks to you I managed to implement what I needed.

I attached an AF_XDP socket to both container veth ends and did a small trick to make them bounce in between until the message is what I need to be XDP_PASS'd to the kernel.

We can close this issue. Hope it helps anyone else some time in the future.

Thank you a lot! ❤️

tohojo commented 8 months ago

someone @.***> writes:

Hello @tohojo Thanks to you I managed to implement what I needed.

I attached an AF_XDP socket to both container veth ends and did a small trick to make them bounce in between until the message is what I need to be XDP_PASS'd to the kernel.

We can close this issue. Hope it helps anyone else some time in the future.

Thank you a lot! ❤️

Great - you're welcome!