xdp-project / bpf-examples

Making eBPF programming easier via build env and examples
431 stars 84 forks source link

[help] AF_XDP-forwarding don't forward packet.(tcpdump capture nothing) #94

Open xjj210130 opened 1 year ago

xjj210130 commented 1 year ago

HI all: My env is centos 7. core is 5.15.46 (build by myself) gcc 11.2.1 clang 14.0.6 llvm 14.0.6 bpftool is v5.15.46 I do as following: 1: git clone https://github.com/xdp-project/bpf-examples.git 2: cd bpf-examples/AF_XDP-forwarding 3: make 4: run cmd ./xsk_fwd-i eth3 -q 0 -i eth4 -q 0 -c 3 in fact , i modify the code swap_mac_addresses as following, i want to do recv packet from eth3,and then forward the packet to eth4(9.168.1.4) and send the packet to new pc, but i use tcpdump -i eth4 -w 1.cap. there is no packet forward to eth4 . Is it some wrong? or i need change more?

static void swap_dst_info(void data){ char dst_ip ="9.168.1.4"; char *new_dst_ip ="9.168.1.5";

struct iphdr *iph = data + sizeof(struct ethhdr);
    iph->saddr = inet_addr(dst_ip );
   iph->daddr = inet_addr(new_dst_ip );
iph->check = csum_diff4(sip,iph->saddr, iph->check); 
iph->check = csum_diff4(dip,iph->daddr, iph->check);
printf("end swap_dst_info    sip is %d, dip is %d  %d\n ",iph->saddr, iph->daddr, __LINE__); 
 // the following code modify the tcp 

   struct tcphdr *tcph = data + sizeof(struct ethhdr) + (iph->ihl * 4);
     tcph->source = 1111;
       tcph->dest = 22222;
    tcph->check = csum_diff4(oldsrcport, tcph->source, tcph->check);
       .......

} i use kernal to process arp, i only process tcp protocol. in fact , i print the info in function port_tx_burst, the src dst ip is what i changed. i add log in af_xdp, there is no event to trigger kernal function xsk_def_prog( xdp prog), ### it means tx queue message don't send. so i think there is need to do something , but i don't know use someone api.

Of course , it can't use tun/tap. It doesn't use af_xdp tx queue Is there any example i can refer? Thanks very much. @tohojo

maryamtahhan commented 1 year ago

Hi @xjj210130

I presume eth4 is some sort of physical driver and you are using AF_XDP in native/zero copy mode? tcpdump will not see the packets being sent through an AF_XDP socket in ZC mode. The packet is being pushed directly to the driver from AF_XDP. The xdp program only executes on the RX side hook, there's no TX hooks for XDP today.

If you just want to validate your application - you can try using a veth pair on the tx side of your app and either tcpdump on the veth peer (aka not the veth directly attached to your application) or check the ethtool stats for the veth directly attached to your app there will be tx_queue_0_xdp_xmit stats field... alternatively you can physically hook up another NIC interface directly to your eth4 and tcpdump there.

You can always attach a kprobe to xsk_sendmsg and see how many times it gets invoked.... something like bpftrace -e 'kprobe:xsk_sendmsg {@cnt[kstack()] = count(); }'

to give you a rough idea of the TX path checkout this link You are only really interested in the step 1-21 (just as an example - this is for a veth driver)...

xjj210130 commented 1 year ago

@maryamtahhan Yes, eth4 is physical driver and i using AF_XDP in native/zero copy mode. i use the following cmd: tcpdump -i any -w 7.cap on the two pc( one is "9.168.1.4" and the other is "9.168.1.5"). Both of them capture nothing from 9.168.1.4 or 9.168.1.5

I am very confused, how the modified message is sent out, in this example(AF_XDP-forwarding). Is there in fuction port_tx_burst?

i skim the source af_xdp, the af_xdp use prog from file(xsk_def_xdpprog.c). There is code look like this if (!refcnt) { return XDP_PASS; }_

Is this send packet to other pc refcnt==0 ? If so, how to trigger the condition?

there's no TX hooks for XDP today means i need send the packet by myself, is it right?

Thank you very much.

maryamtahhan commented 1 year ago

I'm only making guesses here regarding your setup. If you can send on a highlevel diagram of your setup and where you are AF_XDP programs loaded and where you are trying to tcpdump that would be very helpful.

If you are using an XDP prog on the other machine where you are trying to capture the packets I would recommend using xdpdump

I believe what the code you reference is saying: if there's nothing using this XDP program (refcount would be 0) then pass all incoming traffic to the kernel... If you want to capture incoming traffic at the XDP hook where an XDP prog is loaded you can use a command like the following:

xdpdump -i eno1 -w - | tcpdump -r - -n

regarding your question about port_tx_burst --> CNDP had a great explanation of how buffers are added and consumed from the UMEM here, please see the TX flow specifically for what you are trying to understand. The last thing to note is the sendto() call which triggers the flow in the sequence diagram I sent on in my earlier reply (steps 1 - 3):

sendto(xsk_socket__fd(p->xsk), NULL, 0, MSG_DONTWAIT, NULL, 0);

After step 3 the zc driver gets triggered from __xsk_sendmsg()

maryamtahhan commented 1 year ago

there's no TX hooks for XDP today means i need send the packet by myself, is it right?

No - my comment re TX-Hooks was referring to being able to capture outgoing traffic from an application using an AF_XDP socket. You can still send the packet through the xsk, you just won't be able to dump it before it leaves the interface right now...

xjj210130 commented 1 year ago

@maryamtahhan Thank you very much. I referenced the code xdp-tutorial-master/reflector. https://github.com/xdp-project/xdp-tutorial

the main code as following:


static bool process_packet(struct xsk_socket_info *xsk_dst, struct xsk_socket_info *xsk_src,
               uint64_t addr, uint32_t len)
{
    uint8_t *pkt = xsk_umem__get_data(xsk_src->umem->buffer, addr);
    struct ethhdr *eth = (struct ethhdr *)pkt;
    struct iphdr *ip = (struct iphdr *)(eth + 1);
    int ret;
    uint32_t tx_idx = 0;

    if (ntohs(eth->h_proto) == ETH_P_IP &&
        len > (sizeof(*eth) + sizeof(*ip))) 
    {

        __u8 protocol = ip->protocol;
        __u32 saddr = ntohl(ip->saddr);
        __u32 daddr = ntohl(ip->daddr);
        if (protocol == IPPROTO_TCP) {
            struct tcphdr *tcp = (struct tcphdr *)(ip + 1);
            __u32 sourceport = ntohs(tcp->source);
            __u32 destport = ntohs(tcp->dest);

            ret = xsk_ring_prod__reserve(&xsk_dst->tx, 1, &tx_idx);
        struct xdp_desc *tx_desc=xsk_ring_prod__tx_desc(&xsk_dst->tx, tx_idx);
        tx_desc->addr= addr ;
        tx_desc->len = len ;
        xsk_ring_prod__submit(&xsk_dst->tx, 1) ;
        xsk_dst->outstanding_tx++;
        xsk_dst->stats.tx_bytes += len;
        xsk_dst->stats.tx_packets++;
        return true;
        } 
        }
    }
    return false;
}

static void complete_tx(struct xsk_socket_info *xsk)
{
    //...
    sendto(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, 0);
}

static void handle_receive_packets(struct xsk_socket_info *xsk_dst, struct xsk_socket_info *xsk_src)
{
    unsigned int rcvd, stock_frames, i;
    uint32_t idx_rx = 0, idx_fq = 0;
    int ret;

    rcvd = xsk_ring_cons__peek(&xsk_src->rx, RX_BATCH_SIZE, &idx_rx);

    /* Stuff the ring with as much frames as possible */
    stock_frames = xsk_prod_nb_free(&xsk_src->fq,
                    xsk_umem_free_frames(xsk_src->umem));

    if (stock_frames > 0) {

        ret = xsk_ring_prod__reserve(&xsk_src->fq, stock_frames,
                         &idx_fq);

        /* This should not happen, but just in case */
        while (ret != stock_frames)
            ret = xsk_ring_prod__reserve(&xsk_src->fq, rcvd,
                             &idx_fq);

        for (i = 0; i < stock_frames; i++)
            *xsk_ring_prod__fill_addr(&xsk_src->fq, idx_fq++) =
                umem_alloc_umem_frame(xsk_src->umem);

        xsk_ring_prod__submit(&xsk_src->fq, stock_frames);
    }

    /* Process received packets */
    for (i = 0; i < rcvd; i++) {
        uint64_t addr = xsk_ring_cons__rx_desc(&xsk_src->rx, idx_rx)->addr;
        uint32_t len = xsk_ring_cons__rx_desc(&xsk_src->rx, idx_rx++)->len;

        if (!process_packet(xsk_dst, xsk_src, addr, len))
            umem_free_umem_frame(xsk_src->umem, addr);

        xsk_src->stats.rx_bytes += len;
    }

    xsk_ring_cons__release(&xsk_src->rx, rcvd);
    xsk_src->stats.rx_packets += rcvd;
    complete_tx(xsk_dst, xsk_src); 
  }

static void rx_and_process(struct config *cfg,
               struct xsk_socket_info *xsk_socket_0, struct xsk_socket_info *xsk_socket_1)
{
    struct pollfd fds[2];
    int ret=0, nfds = 2;

    memset(fds, 0, sizeof(fds));
    fds[0].fd = xsk_socket__fd(xsk_socket_0->xsk);
    fds[0].events = POLLIN;
    fds[1].fd = xsk_socket__fd(xsk_socket_1->xsk);
    fds[1].events = POLLIN;
    while(!global_exit) {
            ret = poll(fds, nfds, -1);
            if (ret <= 0 || ret > 2)
            {
                continue;
            }

        if ( fds[0].revents & POLLIN )
        {
            handle_receive_packets(xsk_socket_1, xsk_socket_0) ;
        } 
        //if ( fds[1].revents & POLLIN ) handle_receive_packets(xsk_socket_0, xsk_socket_1) ; no need
    }
}

static void enter_xsks_into_map(struct xsk_socket_info *xsk_socket, int xsks_map)
{
    int i;
    if (xsks_map < 0) {
        fprintf(stderr, "ERROR: no xsks map found: %s\n",
            strerror(xsks_map));
        exit(EXIT_FAILURE);
    }

    for (i = 0; i < 1; i++) {
        int fd = xsk_socket__fd(xsk_socket->xsk);
        int key, ret;

        key = i;
        ret = bpf_map_update_elem(xsks_map, &key, &fd, 0);
        if (ret) {
            perror("error is ");
            fprintf(stderr, "ERROR: bpf_map_update_elem %d, xsks_map is [%d], %d\n", i, xsks_map, __LINE__);
            exit(EXIT_FAILURE);
        }
    }
}

int main(int argc, char **argv)
{
    // init the para
    //load  prog myself
    if (cfg.filename[0] != 0) {
        struct bpf_map *map;
        bpf_obj = load_bpf_and_xdp_attach(&cfg);
        if (!bpf_obj) {
            /* Error handling done in load_bpf_and_xdp_attach() */
            exit(EXIT_FAILURE);
        }

        map = bpf_object__find_map_by_name(bpf_obj, "xsks_map");
        xsks_map_0_fd = bpf_map__fd(map);
        if (xsks_map_0_fd < 0) {
            fprintf(stderr, "ERROR: no xsks map 0 found: %s\n",
                strerror(xsks_map_0_fd));
            exit(EXIT_FAILURE);
        }
    }
    struct xsk_ring_prod fq ;
    struct xsk_ring_cons cq ;
    memset(&fq,0,sizeof(fq));
    memset(&cq,0,sizeof(cq));
    umem = configure_xsk_umem(packet_buffer, packet_buffer_size, &fq, &cq);
    if (umem == NULL) {
        fprintf(stderr, "ERROR: Can't create umem \"%s\"\n",
            strerror(errno));
        exit(EXIT_FAILURE);
    }

    /* Open and configure the AF_XDP (xsk) socket */
    xsk_socket_0 = xsk_configure_socket(&cfg, umem, 0);
    if (xsk_socket_0 == NULL) {
        fprintf(stderr, "ERROR: Can't setup AF_XDP socket 0 \"%s\"\n",
            strerror(errno));
        exit(EXIT_FAILURE);
    }
    // bind the fd with prog
    for(int i=0; i < 1; i++)
    {
        enter_xsks_into_map(xsk_socket_0,xsks_map_0_fd);
    }
    xsk_socket_1 = xsk_configure_socket(&cfg, umem, 1);
    if (xsk_socket_1 == NULL) {
        fprintf(stderr, "ERROR: Can't setup AF_XDP socket 1 \"%s\"\n",
            strerror(errno));
        exit(EXIT_FAILURE);
    }

    xsks_map_0_fd, xsk_socket__fd(xsk_socket_0->xsk),xsk_socket__fd(xsk_socket_1->xsk),__LINE__);
    rx_and_process(&cfg, xsk_socket_0, xsk_socket_1);
    xsk_socket__delete(xsk_socket_0->xsk);
    xsk_socket__delete(xsk_socket_1->xsk);
    xsk_umem__delete(umem->umem);
    xdp_link_detach(cfg.ifindex, cfg.xdp_flags, 0);
    xdp_link_detach(cfg.redirect_ifindex, cfg.xdp_flags, 0);
    return EXIT_OK;
}
tatic struct xsk_socket_info *xsk_configure_socket(struct config *cfg,
                            struct xsk_umem_info *umem, int slot)
{
    struct xsk_socket_config xsk_cfg;
    uint32_t idx;
    uint32_t prog_id = 0;
    struct xsk_socket_info *xsk_info;
    xsk_info = calloc(1, sizeof(*xsk_info));
    if (!xsk_info)
        return NULL;

    xsk_info->umem = umem;
    if (0 == slot)
    {
        xsk_cfg.rx_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
        xsk_cfg.tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
        xsk_cfg.libbpf_flags = 0;
        xsk_cfg.xdp_flags = cfg->xdp_flags;
        xsk_cfg.bind_flags = cfg->xsk_bind_flags;
        xsk_cfg.libxdp_flags = XSK_LIBXDP_FLAGS__INHIBIT_PROG_LOAD;
    }else{
        xsk_cfg.rx_size = XSK_RING_CONS__DEFAULT_NUM_DESCS;
        xsk_cfg.tx_size = XSK_RING_PROD__DEFAULT_NUM_DESCS;
        xsk_cfg.libbpf_flags = 0;
        xsk_cfg.xdp_flags = cfg->xdp_flags;
        xsk_cfg.bind_flags = cfg->xsk_bind_flags;
    }

    ret = xsk_socket__create_shared(&xsk_info->xsk,
                             (slot == 0) ? cfg->ifname : cfg->redirect_ifname,
                             cfg->xsk_if_queue,
                             umem->umem,
                             &xsk_info->rx,
                             &xsk_info->tx,
                             &xsk_info->fq,
                             &xsk_info->cq,
                             &xsk_cfg);

    printf("xsk_socket__create_shared returns %d\n", ret) ;
    if (ret)
        goto error_exit;
/*
    ret = bpf_get_link_xdp_id(slot == 0 ? cfg->ifindex : cfg->redirect_ifindex, &prog_id, cfg->xdp_flags);
    if (ret)
        goto error_exit;
        */

//  if (slot == 0)
    {
        /* Stuff the receive path with buffers, we assume we have enough */
        ret = xsk_ring_prod__reserve(&xsk_info->fq,
                         XSK_RING_PROD__DEFAULT_NUM_DESCS,
                         &idx);

        printf("xsk_ring_prod__reserve returns %d, XSK_RING_PROD__DEFAULT_NUM_DESCS is %d\n", ret, XSK_RING_PROD__DEFAULT_NUM_DESCS);
        if (ret != XSK_RING_PROD__DEFAULT_NUM_DESCS)
            goto error_exit;

        for (i = 0; i < XSK_RING_PROD__DEFAULT_NUM_DESCS; i ++)
            *xsk_ring_prod__fill_addr(&xsk_info->fq, idx++) =
                umem_alloc_umem_frame(xsk_info->umem);

        xsk_ring_prod__submit(&xsk_info->fq,
                      XSK_RING_PROD__DEFAULT_NUM_DESCS);
    }
    return xsk_info;

}

xsk_socket_0 eth3 receive the packet from network xsk_socket_1 eth4 send the packet to network

i do my work in function: process_packet, however, there is no packet sent from eth4. Is it the code wrong?

The xdp code need trigger TX_RING, and call function send_msg. but The above code call function send, I'm doubt why it didn't trigger send. i test the code https://github.com/xdp-project/bpf-examples/tree/master/AF_XDP-forwarding, it's the same phenomenon,no packet is sent. Thanks again.