luigirizzo / netmap

Automatically exported from code.google.com/p/netmap
BSD 2-Clause "Simplified" License
1.86k stars 537 forks source link

Segmentation fault when trying to allocate 2 million extra buffers #602

Open mkaniewski opened 5 years ago

mkaniewski commented 5 years ago

Hi,

I have a program which allocates a lot of extra netmap buffers. When there is a lot of them requested ( > 2 millions) the netmap returns them succesfully but I get a segmentation fault when I try to dereference the buffer pointer. I prepared a sample application that shows the problem:

#include <sys/types.h>
#include <sys/event.h>
#include <sys/time.h>

#include <assert.h>
#include <stdlib.h>
#include <stdio.h>

#define NETMAP_WITH_LIBS
#pragma GCC diagnostic ignored "-Wcast-qual"
#include <net/netmap_user.h>

static void
usage()
{
        fprintf(stderr, "Usage: niocregif -i iface.\n");
        exit(1);
}

int main(int argc, char *argv[])
{
        struct nmreq req;
        struct netmap_if *nifp;
        struct netmap_ring *ring;
        size_t currindex;
        int ch, error, fd;
        const char *ifacestr;
        void *p;

        ifacestr = NULL;
        while ((ch = getopt(argc, argv, "i:")) != -1) {
                switch (ch) {
                case 'i':
                        ifacestr = optarg;
                        break;
                default:
                        usage();
                }
        }

        if (ifacestr == NULL) {
                usage();
        }

        fd = open("/dev/netmap", O_RDWR);
        if (fd == -1) {
                D("Unable to open /dev/netmap");
                return -1;
        }

        bzero(&req, sizeof(req));

        memcpy(req.nr_name, ifacestr, strlen(ifacestr));
        req.nr_version = NETMAP_API;
        req.nr_ringid = NETMAP_NO_TX_POLL | 0;
        req.nr_flags = NR_REG_ONE_NIC;
        req.nr_arg3 = 2000000;

        printf("Opening netmap device %s.\n", ifacestr);
        error = ioctl(fd, NIOCREGIF, &req);
        assert(error == 0);

        p = mmap(0, req.nr_memsize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
        assert(mmap != MAP_FAILED);
        nifp = NETMAP_IF(p, req.nr_offset);
        assert(nifp != NULL);
        ring = NETMAP_RXRING(nifp, 0);

        currindex = nifp->ni_bufs_head;
        for (unsigned int ii = 0; ii < req.nr_arg3; ii++) {
                char *buf;
                size_t *nextindex;

                buf = NETMAP_BUF(ring, currindex);
                nextindex = (size_t *)(void *)buf;
                fprintf(stderr, "Buf %jd -> %jd\n", currindex, *nextindex);
                currindex = *nextindex;
        }
}

On FreeBSD 11.2 it ends with segmentation fault on first fprintf in the loop. In "/var/log/messages" there is a message:

Mar 21 19:20:10 freebsd11 kernel: 410.623866 [2215] netmap_ioctl              requested 2000000 extra buffers
Mar 21 19:20:10 freebsd11 kernel: 410.623885 [ 736] netmap_extra_alloc        allocate buffer 18435 -> 0
Mar 21 19:20:10 freebsd11 kernel: 410.623897 [ 736] netmap_extra_alloc        allocate buffer 18436 -> 18435
Mar 21 19:20:10 freebsd11 kernel: 410.623909 [ 736] netmap_extra_alloc        allocate buffer 18437 -> 18436
Mar 21 19:20:10 freebsd11 kernel: 410.623920 [ 736] netmap_extra_alloc        allocate buffer 18438 -> 18437
Mar 21 19:20:10 freebsd11 kernel: 410.623931 [ 736] netmap_extra_alloc        allocate buffer 18439 -> 18438
Mar 21 19:20:10 freebsd11 kernel: 410.853557 [2218] netmap_ioctl              got 2000000 extra buffers

which shows that kernel successfully allocated the buffers (to make such a big allocation I had to increase the buffer limit hardcoded in netmap_mem2.c).

These are my sysctls:

dev.netmap.ixl_rx_miss_bufs: 0
dev.netmap.ixl_rx_miss: 0
dev.netmap.iflib_rx_miss_bufs: 0
dev.netmap.iflib_rx_miss: 0
dev.netmap.iflib_crcstrip: 1
dev.netmap.bridge_batch: 1024
dev.netmap.default_pipes: 0
dev.netmap.priv_buf_num: 4098
dev.netmap.priv_buf_size: 2048
dev.netmap.buf_curr_num: 4000000
dev.netmap.buf_num: 4000000
dev.netmap.buf_curr_size: 2048
dev.netmap.buf_size: 2048
dev.netmap.priv_ring_num: 4
dev.netmap.priv_ring_size: 20480
dev.netmap.ring_curr_num: 200
dev.netmap.ring_num: 200
dev.netmap.ring_curr_size: 36864
dev.netmap.ring_size: 36864
dev.netmap.priv_if_num: 1
dev.netmap.priv_if_size: 1024
dev.netmap.if_curr_num: 100
dev.netmap.if_num: 100
dev.netmap.if_curr_size: 1024
dev.netmap.if_size: 1024
dev.netmap.generic_rings: 1
dev.netmap.generic_ringsize: 1024
dev.netmap.generic_mit: 100000
dev.netmap.admode: 0
dev.netmap.fwd: 0
dev.netmap.flags: 0
dev.netmap.adaptive_io: 0
dev.netmap.txsync_retry: 2
dev.netmap.no_pendintr: 1
dev.netmap.mitigate: 1
dev.netmap.no_timestamp: 0
dev.netmap.verbose: 0
dev.netmap.ix_rx_miss_bufs: 0
dev.netmap.ix_rx_miss: 0
dev.netmap.ix_crcstrip: 0

I will be glad to hear any idea why such a huge allocation fails. Thanks.

vmaffione commented 5 years ago

How did you patch the code exactly?

Have you tried the same experiment on stable/11? Your 11.2-release may contain outdated code. I tried your program on HEAD, and it does not segfault (although it does not really allocate all those buffers, because it runs out of memory).

mkaniewski commented 5 years ago

I tested my program on STABLE/11 and it still crashes with SEGFAULT. The patch which allows to request for 2 000 000 extra buffers (I set it to 4 000 000 millions, to have some reserve) is:

Index: sys/dev/netmap/netmap_mem2.c
===================================================================
--- sys/dev/netmap/netmap_mem2.c        (revision 345457)
+++ sys/dev/netmap/netmap_mem2.c        (working copy)
@@ -551,7 +551,7 @@
                        .objminsize = 64,
                        .objmaxsize = 65536,
                        .nummin     = 4,
-                       .nummax     = 1000000, /* one million! */
+                       .nummax     = 4000000, /* four million! */
                },
        },

After some digging I was able to find a source of the problem. It is caused by the fact that netmap use 32 bit integers in multiple places. Therefore if I request a big enough number of buffers then an integer overflow may occur. For example take a look how netmap_obj_pool.memtotal is calculated in netmap_finalize_obj_allocator():

p->memtotal = p->numclusters * p->_clustsize;

In my case p->numclusters is equal 2 000 000 and p->_clustsize is 4096 so result should be 8 192 000 000. Instead it is 3 897 032 704 because this is the result after overflow on uint32. It is also visible in system logs when you try to run the program (sysctl dev.netmap.verbose must be enabled):

Mar 25 14:29:59 freebsd11 kernel: 598.988402 [1481] netmap_finalize_obj_allocator Pre-allocated 2000000 clusters (4/3805696KB) for 'netmap_buf'

To solve this issue I had to change some of the variable types to size_t. I would like to change more variables to this type but I am not sure how it will impact netmap internals. In attachement there is a file with my current changes.

diff.txt

Let me know what do you think about it and could we uptsream it. Thanks.

vmaffione commented 5 years ago

Thanks for spotting the issue. Feel free to open a pull request with your changes. We have some unit tests and integration tests to catch regressions.

On Mon, Mar 25, 2019, 5:46 PM mkaniewski notifications@github.com wrote:

I tested my program on STABLE/11 and it still crashes with SEGFAULT. The patch which allows to request for 2 000 000 extra buffers (I set it to 4 000 000 millions, to have some reserve) is:

Index: sys/dev/netmap/netmap_mem2.c

--- sys/dev/netmap/netmap_mem2.c (revision 345457) +++ sys/dev/netmap/netmap_mem2.c (working copy) @@ -551,7 +551,7 @@ .objminsize = 64, .objmaxsize = 65536, .nummin = 4,

  • .nummax = 1000000, / one million! /
  • .nummax = 4000000, / four million! / }, },

After some digging I was able to find a source of the problem. It is caused by the fact that netmap use 32 bit integers in multiple places. Therefore if I request a big enough number of buffers then an integer overflow may occur. For example take a look how netmap_obj_pool.memtotal is calculated in netmap_finalize_obj_allocator():

p->memtotal = p->numclusters * p->_clustsize;

In my case p->numclusters is equal 2 000 000 and p->_clustsize is 4096 so result should be 8 192 000 000. Instead it is 3 897 032 704 because this is the result after overflow on uint32. It is also visible in system logs when you try to run the program (sysctl dev.netmap.verbose must be enabled):

Mar 25 14:29:59 freebsd11 kernel: 598.988402 [1481] netmap_finalize_obj_allocator Pre-allocated 2000000 clusters (4/3805696KB) for 'netmap_buf'

To solve this issue I had to change some of the variable types to size_t. I would like to change more variables to this type but I am not sure how it will impact netmap internals. In attachement there is a file with changes.

diff.txt https://github.com/luigirizzo/netmap/files/3004435/diff.txt

Let me know what do you think about it and could we uptsream it. Thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/luigirizzo/netmap/issues/602#issuecomment-476283484, or mute the thread https://github.com/notifications/unsubscribe-auth/AEsSwQ13K4pD_iCeHnxbBEICYkhDwqPtks5vaP12gaJpZM4cCOIt .

j-t-d commented 5 years ago

I took the provided diff and made PR #637, as we've just run into the segfault issue as well.