nanovms / nanos

A kernel designed to run one and only one application in a virtualized environment
https://nanos.org
Apache License 2.0
2.58k stars 133 forks source link

Is the `memfd_create` syscall supported? #1986

Closed ls-1801 closed 8 months ago

ls-1801 commented 8 months ago

I am running into issues, where I want to port a ring buffer implementation that uses mmap under the hood to a unikernel.

Here is a minimal working example:

#include <iostream>
#include <cassert>
#include <unistd.h>
#include <sys/mman.h>

int main()
{
    size_t capacity = getpagesize() * 2;

    int fd = memfd_create("queue_buffer", 0);
    ftruncate(fd, capacity);
    char* data = static_cast<char*>(mmap(NULL, 2 * capacity, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0));
    mmap(data, capacity, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, fd, 0);
    mmap(data + capacity, capacity, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED, fd, 0);

    data[0] = 'a';
    assert(data[capacity] == 'a');
}

The trace reveals that memfd_create is not a (supported) syscall?

    ...
    2 memfd_create
    2 nosyscall memfd_create
    2 ftruncate
    2 ftruncate -1 8192
    2 direct return: -9, rsp 0xffce3ca8
    2 mmap
    2 mmap: addr 0x0000000000000000, length 0x4000, prot 0x0, flags 0x22, fd -1, offset 0x0
    2    returning 0x24807f000
    2 direct return: 9798414336, rsp 0xffce3c98
    2 mmap
    2 mmap: addr 0x000000024807f000, length 0x2000, prot 0x3, flags 0x11, fd -1, offset 0x0
    2 direct return: -9, rsp 0xffce3c98
    2 mmap
    2 mmap: addr 0x0000000248081000, length 0x2000, prot 0x3, flags 0x11, fd -1, offset 0x0
   # Program is stuck here

Yet the Nanos Website lists memfd_create as a supported syscall. If memfd_create is not supported, are there workarounds, maybe touching the page table manually?

eyberg commented 8 months ago

the page you are prob? referring to https://nanos.org/thebook lists it under 'unsupported' although it's a bit hard to read and should be cleaned up for readability purposes, no it's not supported right now but I don't see a reason why we couldn't add it

ls-1801 commented 8 months ago

Oh, you are right, I just ctrl+f for memfd_create and totally missed the not-supported header.

I assume it wouldn't be a particularly high priority for anyone except me, so I will just tinker around a bit. Thank you for your quick response!

francescolavra commented 6 months ago

https://github.com/nanovms/nanos/pull/2005 adds support for the memfd_create syscall. This syscall is implemented in a klib named "shmem" (which depends on another klib named "tmpfs"), so in order to be able to use this functionality, the unikernel image must contain the shmem and tmpfs klibs, as in the following snippet of the Ops configuration file: "Klibs": ["shmem", "tmpfs"].

Regarding the example code in the description of this issue, please note that the compiler may reorder instructions, which can make the assert() fail if the value of data[capacity] is read before writing the value of data[0]. In order to avoid this, a volatile variable can be used instead of data (see for example the code in test/runtime/shmem.c in the above PR).

ls-1801 commented 6 months ago

Hey very cool thank you I will try that!