nanovms / nanos

A kernel designed to run one and only one application in a virtualized environment
https://nanos.org
Apache License 2.0
2.57k stars 133 forks source link

foundationdb tracking tkt - low priority #844

Closed eyberg closed 2 years ago

eyberg commented 5 years ago

looks like it uses aio calls

1 run frame 0x0000000100201800, RIP=0x00000070744116d7
1 io_setup
1 nosyscall io_setup
1 futex
1 direct return: 0, rsp 0x7e9fe510
1 epoll_ctl
1 direct return: 0, rsp 0x7e9feaf8
1 close
1 close: fd 5
1 direct return: 0, rsp 0x7e9feb00
1 fstat
1 fd 2, stat 0x000000007e9fe570
1 st_ino 0, st_mode 0x1000, st_size 0
1 direct return: 0, rsp 0x7e9fe568
1 write
Error: Disk i/o operation failed

also timers ->

1 timerfd_create
1 nosyscall timerfd_create

lastly I think the aio calls need signals anyways so since we already have a ton of tkts for signals that would prob. pre-req this

eyberg commented 5 years ago
eyberg@dungeon:~/fd/fdb_binaries$ cat config.json
{
  "Args":["fdbserver", "-p", "0.0.0.0:4500", "-C", "/fdb.cluster"],
  "Files": ["fdb.cluster"]
}
eyberg@dungeon:~/fd/fdb_binaries$ cat fdb.cluster
balls:balls@0.0.0.0:4500
eyberg commented 4 years ago

timers now work here from posix-timer branch but still wants aio calls

eyberg commented 4 years ago

we have some aio support now - now it's throwing a platform errror

little bit past this failed getrlimit

1 prlimit64
1 getrlimit: pid 0, resource 9, new_limit 0x0000000000000000, old_limit 0x0000000076f8fc60
1 getrlimit: resource 9, rlim 0x0000000076f8fc60
1 direct return: -22, rsp 0x76f8fc28
    2 run thread, cpu 0, frame 0x0000000100e06400, rip 0x153afe9, rsp 0x8e8dff9a0, rdi 0x8f3ad3000, rax 0x8f3ac0000, rflags 0x246, cs 0x2b, iret
1 run thread, cpu 0, frame 0x0000000100e05c00, rip 0x880915fa0, rsp 0x76f8fc28, rdi 0x0, rax 0xffffffffffffffea, rflags 0x246, cs 0x2b, sysret
    2 run thread, cpu 0, frame 0x0000000100e06400, rip 0x153afe9, rsp 0x8e8dff9a0, rdi 0x8f3ad4000, rax 0x8f3ac0000, rflags 0x246, cs 0x2b, iret
1 run thread, cpu 0, frame 0x0000000100e05c00, rip 0x8808950a9, rsp 0x76f8fa40, rdi 0x0, rax 0x60, rflags 0x283, cs 0x2b, iret
1 futex
1 futex_wake [1 0x000000000226eae0 2] 2147483647
1 direct return: 0, rsp 0x76f8f650
    2 run thread, cpu 0, frame 0x0000000100e06400, rip 0x153afe9, rsp 0x8e8dff9a0, rdi 0x8f3ad5000, rax 0x8f3ac0000, rflags 0x246, cs 0x2b, iret
1 run thread, cpu 0, frame 0x0000000100e05c00, rip 0x8f700f84e, rsp 0x76f8f650, rdi 0x226eae0, rax 0x0, rflags 0x246, cs 0x2b, sysret
1 fstat
1 fd 2, stat 0x0000000076f8f560
1 st_ino 0, st_mode 0x1000, st_size 0
1 direct return: 0, rsp 0x76f8f558
    2 run thread, cpu 0, frame 0x0000000100e06400, rip 0x153afe9, rsp 0x8e8dff9a0, rdi 0x8f3ad6000, rax 0x8f3ac0000, rflags 0x246, cs 0x2b, iret
1 run thread, cpu 0, frame 0x0000000100e05c00, rip 0x88090f7c3, rsp 0x76f8f558, rdi 0x2, rax 0x0, rflags 0x246, cs 0x2b, sysret
1 write
Error: Platform error
$ ops run -c config.json fdbserver
[fdbserver -p 0.0.0.0:4500 -C /fdb.cluster]
booting /home/eyberg/.ops/images/fdbserver.img ...
qemu-system-x86_64: warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
assigned: 10.0.2.15
Error: Platform error
exit status 21
francescolavra commented 4 years ago

With #1158, it gets past the above platform error, but then fails with ERROR: Could not open shared memory - Function not implemented. In order to implement POSIX shared memory, we need to support:

Also, foundationDB needs the getrusage syscall, and needs support for both RUSAGE_SELF and RUSAGE_THREAD to retrieve the CPU time used by the process and the calling thread, respectively. We currently have process-wide timers that we can use for RUSAGE_SELF, but we don't have per-thread timers, needed for RUSAGE_THREAD.

eyberg commented 4 years ago

https://github.com/nanovms/nanos/issues/1161

eyberg commented 4 years ago

https://github.com/nanovms/nanos/issues/1162

wjhun commented 4 years ago

With #1158, it gets past the above platform error, but then fails with ERROR: Could not open shared memory - Function not implemented. In order to implement POSIX shared memory, we need to support:

Did you get any sense of what shared memory would be used for here? If it's setting it up to share with a forked off child process, then it's kind of a non-starter. If it's only using the shared memory within a thread group, we might be able to explore taking a shortcut since it's all one address space anyway. (Though I suppose it could be using multiple mappings to the same memory even within a thread group.)

  • a RAM-based filesystem (on Linux, POSIX shared memory is implemented via a RAMFS-type filesystem mounted at /dev/shm/; the shm_open() function issues a statfs syscall on this directory and expects the returned filesystem type to be RAMFS)

A full-blown ramfs implementation may not be necessary if it's just using file nodes under /dev/shm as handles to shared memory objects. That is unless the program attempts to do file operations other than ftruncate/mmap/close on the fd.

  • file-backed shared memory mappings (files under /dev/shm/ are read and written via a memory mapping)

Similarly, full-blown file-backed mappings (i.e. faulting in pages from backing store) aren't really necessary if the backing is only a ramfs file. Mostly mmap() would just need to be able to detect that the fd is a shm handle and then set up a mapping to the object memory or return an existing one.

Also, foundationDB needs the getrusage syscall, and needs support for both RUSAGE_SELF and RUSAGE_THREAD to retrieve the CPU time used by the process and the calling thread, respectively. We currently have process-wide timers that we can use for RUSAGE_SELF, but we don't have per-thread timers, needed for RUSAGE_THREAD.

Thread time accounting shouldn't (?) be difficult to add ... maybe change procenter to threadenter and take care of both process and thread accounting in one step? (There may be some bitrot in that department, probably from the SMP work; I see nothing is calling proc_pause right now.)

francescolavra commented 4 years ago

Did you get any sense of what shared memory would be used for here?

As far as I can see from the source code (https://github.com/apple/foundationdb/blob/939a62449f22c8ce4fe506595dc2ff85b18e448d/fdbserver/fdbserver.actor.cpp#L332), it's only used to set a "machine ID" that is shared with any other foundationdb server processes running on the same machine. Note that any such multiple processes are independent from each other, i.e. they are not forked from one another, they are just multiple nodes of the same DB cluster (as there can be multiple nodes in different machines). In our case, we would have just one process per machine, so shared memory wouldn't be used for anything really, but I couldn't find a way to disable its usage, I tried adding a --machine_id command line argument but it still wants to use shared memory.

francescolavra commented 2 years ago

With the latest FoundationDB release (6.3.15), if the server application is passed a --machine_id command line argument it works without needing shared memory support.

francesco@ubuntu:~$ cat config.json 
{
  "Files": ["fdb.cluster"],
  "Dirs": ["proc"],
  "BaseVolumeSz": "1G",
  "Args":["-p", "0.0.0.0:4500", "--machine_id", "nanos", "--datacenter_id", "nanos"],
  "RunConfig": {
    "Memory": "4G",
    "Ports": ["4500"]
  }
}
francesco@ubuntu:~$ cat fdb.cluster 
balls:balls@0.0.0.0:4500

FoundationDB uses the /proc/meminfo and /proc/self/statm files to retrieve information on memory availability; putting a static meminfo file (with a dummy "MemFree" entry) and an empty statm file is enough to make it work:

francesco@ubuntu:~$ tree
.
├── config.json
├── fdb.cluster
├── fdbserver -> /usr/sbin/fdbserver
└── proc
    ├── meminfo
    └── self
        └── statm

2 directories, 5 files
francesco@ubuntu:~$ cat proc/meminfo 
MemFree:        4000000 kB

There is an issue in Nanos release 0.1.36 that prevents the server from working after stopping and restarting an instance; this issue is fixed in the nightly build. To start the server:

francesco@ubuntu:~$ ops run fdbserver -c config.json -n
booting /home/francesco/.ops/images/fdbserver.img ...
en1: assigned 10.0.2.15
ZoneId set to nanos, dcId to nanos
FDBD joined cluster.
en1: assigned FE80::E009:6FF:FE5E:813E

To communicate with the server:

francesco@ubuntu:~$ fdbcli -C fdb.cluster 
Using cluster file `fdb.cluster'.

The database is unavailable; type `status' for more information.

Welcome to the fdbcli. For help, type `help'.
fdb> configure new single memory
Database created
fdb> writemode on
fdb> set my_key my_value
Committed (37250421)
fdb> get my_key
`my_key' is `my_value'
fdb> 
eyberg commented 2 years ago

made a pkg of this, closing