nanovms / ops

ops - build and run nanos unikernels
https://ops.city
MIT License
1.27k stars 132 forks source link

Request for updating package fdb (to FoundationDB 7.x) #1357

Closed paulreimer closed 2 years ago

paulreimer commented 2 years ago

I would like to use the latest FoundationDB (7.x series) with Nanos, as this version has the new Redwood storage engine.

A quick try at using the existing fdb image (eyberg/fdb:6.3.18) didn't work for me (I tried supplying an fdb.cluster file, but didn't get very far); I'm OK to keep experimenting with it, but perhaps an updated package with the latest release would be a good starting point.

P.S. I love this project so much. I've deployed a few Python VMs to various clouds, and they are working great; the workflow with ops really nails it.

eyberg commented 2 years ago

hrm - something might've changed in nanos since that package was made - it looks like it might be choking on a proc file atm - i assume you are getting this?

➜  ~ ops pkg load eyberg/fdb:6.3.18 --trace -p 4500
booting /Users/eyberg/.ops/images/fdbserver ...

frame trace:
ffffc00000a2fe00:   ffffffff8009d723    (register_special_files + 0000000000000483/00000000000005bb)
ffffc00000a2fe50:   ffffffff800b9b08    (init_unix + 0000000000000228/000000000000051a)
ffffc00000a2fec0:   ffffffff8004a28a    (startup + 000000000000003a/0000000000000592)
ffffc00000a2ff50:   ffffffff80048f5b    (runloop_internal + 00000000000002eb/0000000000000a55)
ffffc00000a2ffc0:   ffffffff8003bb52    (context_switch_finish + 0000000000000082/00000000000001d0)
assertion create_special_file(sf->path, open, sf->alloc_size) failed at /Users/eyberg/go/src/github.com/nanovms/nanos/src/unix/special.c:366  in register_special_files(); halt
➜  ~ ops image tree fdbserver
/
|   fdb.cluster
|   lib
|   |   x86_64-linux-gnu
|   |   |   librt.so.1
|   |   |   libc.so.6
|   |   |   libdl.so.2
|   |   |   libm.so.6
|   |   |   libpthread.so.0
|   lib64
|   |   ld-linux-x86-64.so.2
|   proc
|   |   self
|   |   |   statm
|   |   sys
|   |   |   kernel
|   |   |   |   hostname
|   |   meminfo
|   fdb_6.3.18
|   |   README.md
|   |   fdbserver
|   |   package.manifest
|   etc
|   |   resolv.conf
|   |   passwd

we can take a quick look at it; also - once that's working you can upload your own newer version of fdb to repo.ops.city

eyberg commented 2 years ago

so yeh - it looks like we added proc/meminfo during this time (this package had a stub)

src/unix/special.c:    { "/proc/meminfo", .read = meminfo_read},
cp -R ~/.ops/packages/fdb_6.3.18 ~/.ops/local_packages/.
rm ~/.ops/local_packages/fdb_6.3.18/sysroot/proc/meminfo

➜  ~ ops pkg load --local fdb_6.3.18
booting /Users/eyberg/.ops/images/fdbserver ...
en1: assigned 10.0.2.15
ZoneId set to nanos, dcId to nanos
FDBD joined cluster.

after rming the stub it seems to work for me - from there you could take the same pkg layout and create a new pkg with the newer version

paulreimer commented 2 years ago

Thanks! That was indeed the error I was getting, and your suggested fix to remove the proc/meminfo stub worked great!(fdb_6.3.18 is working well now)

I realized that fdb 7.1.19 is still a pre-release (6.3.24 is the latest; that version works fine when making a new package); though when I try the 7.1.9 executable I get a segfault:

$ ops pkg load --local fdb_7.1.19 --accel=false --trace
<lots of ordinary looking trace lines>
...
    2 no vmap found
    2 delivering SIGSEGV to thread 2; vaddr 0x3b5aedff0 si_code 1
    2 thread_attempt_interrupt: tid 2
    2    uninterruptible or already running
    2 signal 11 received, errno 0, code 1
    2    fault address 0x3b5aedff0
    2    default action
    2 signal 11 (no core generated: limit 0)

signal 11 (no core generated: limit 0)

Just FYI; tbh I'm happy using the latest stable release for the time being.

eyberg commented 2 years ago

not sure why you are getting that segfault - maybe cause it's linked to different libs

anyways - I cut a new build from the new version - try and see if that works for you:

 ops pkg load eyberg/fdb:7.1.19 -p 4500
 100% |████████████████████████████████████████|  [3s:0s] 
booting /home/eyberg/.ops/images/fdbserver ...
en1: assigned 10.0.2.15
ZoneId set to nanos, dcId to nanos
FDBD joined cluster.
en1: assigned FE80::E402:F0FF:FEC3:967B
paulreimer commented 2 years ago

Your new fdb:7.1.19 package works great! (on a Linux x86_64 host, running locally).

My primary dev machine is an M1 Mac; that's where I see that segfault -- and I get the same segfault with your package as well, on the M1 mac -- though the 6.3.18-based image does seem to work on both Mac + Linux.

Thanks so much for bumping to the latest FDB version!

eyberg commented 2 years ago

since you turned off accel explicitly (which you need to do on m1s) - what version of qemu are you running on the m1?

ops profile

or

 qemu-system-x86_64 --version

should tell you

paulreimer commented 2 years ago

I have:

Ops version: 0.1.32
Nanos version: 0.0
Qemu version: 7.0.0
Arch: darwin
Virtualized: false
paulreimer commented 2 years ago

I think I figured out why it wasn't working -- the odd numbered fdb releases are compiled to use AVX instructions, whereas the even numbered ones are compiled without AVX instructions. So 7.1.18 works on my M1, and also would be more likely to work on most cloud instances.

eyberg commented 2 years ago

so by default we do -cpu host and -cpu max and I'm pretty confident at least avx and avx2 work in those issues since there are many other applications that use it there

https://ahelpme.com/software/qemu/qemu-full-virtualization-cpu-emulations-enable-disable-cpu-flags-instruction-sets-of-qemu-6-2-0/

what's not clear to me atm is how it is implemented in qemu - it might require hardware acceleration cause it might not be present in TCG:

https://gitlab.com/qemu-project/qemu/-/issues/164