Closed paulreimer closed 2 years ago
hrm - something might've changed in nanos since that package was made - it looks like it might be choking on a proc file atm - i assume you are getting this?
➜ ~ ops pkg load eyberg/fdb:6.3.18 --trace -p 4500
booting /Users/eyberg/.ops/images/fdbserver ...
frame trace:
ffffc00000a2fe00: ffffffff8009d723 (register_special_files + 0000000000000483/00000000000005bb)
ffffc00000a2fe50: ffffffff800b9b08 (init_unix + 0000000000000228/000000000000051a)
ffffc00000a2fec0: ffffffff8004a28a (startup + 000000000000003a/0000000000000592)
ffffc00000a2ff50: ffffffff80048f5b (runloop_internal + 00000000000002eb/0000000000000a55)
ffffc00000a2ffc0: ffffffff8003bb52 (context_switch_finish + 0000000000000082/00000000000001d0)
assertion create_special_file(sf->path, open, sf->alloc_size) failed at /Users/eyberg/go/src/github.com/nanovms/nanos/src/unix/special.c:366 in register_special_files(); halt
➜ ~ ops image tree fdbserver
/
| fdb.cluster
| lib
| | x86_64-linux-gnu
| | | librt.so.1
| | | libc.so.6
| | | libdl.so.2
| | | libm.so.6
| | | libpthread.so.0
| lib64
| | ld-linux-x86-64.so.2
| proc
| | self
| | | statm
| | sys
| | | kernel
| | | | hostname
| | meminfo
| fdb_6.3.18
| | README.md
| | fdbserver
| | package.manifest
| etc
| | resolv.conf
| | passwd
we can take a quick look at it; also - once that's working you can upload your own newer version of fdb to repo.ops.city
so yeh - it looks like we added proc/meminfo during this time (this package had a stub)
src/unix/special.c: { "/proc/meminfo", .read = meminfo_read},
cp -R ~/.ops/packages/fdb_6.3.18 ~/.ops/local_packages/.
rm ~/.ops/local_packages/fdb_6.3.18/sysroot/proc/meminfo
➜ ~ ops pkg load --local fdb_6.3.18
booting /Users/eyberg/.ops/images/fdbserver ...
en1: assigned 10.0.2.15
ZoneId set to nanos, dcId to nanos
FDBD joined cluster.
after rming the stub it seems to work for me - from there you could take the same pkg layout and create a new pkg with the newer version
Thanks! That was indeed the error I was getting, and your suggested fix to remove the proc/meminfo
stub worked great!(fdb_6.3.18
is working well now)
I realized that fdb 7.1.19 is still a pre-release (6.3.24 is the latest; that version works fine when making a new package); though when I try the 7.1.9 executable I get a segfault:
$ ops pkg load --local fdb_7.1.19 --accel=false --trace
<lots of ordinary looking trace lines>
...
2 no vmap found
2 delivering SIGSEGV to thread 2; vaddr 0x3b5aedff0 si_code 1
2 thread_attempt_interrupt: tid 2
2 uninterruptible or already running
2 signal 11 received, errno 0, code 1
2 fault address 0x3b5aedff0
2 default action
2 signal 11 (no core generated: limit 0)
signal 11 (no core generated: limit 0)
Just FYI; tbh I'm happy using the latest stable release for the time being.
not sure why you are getting that segfault - maybe cause it's linked to different libs
anyways - I cut a new build from the new version - try and see if that works for you:
ops pkg load eyberg/fdb:7.1.19 -p 4500
100% |████████████████████████████████████████| [3s:0s]
booting /home/eyberg/.ops/images/fdbserver ...
en1: assigned 10.0.2.15
ZoneId set to nanos, dcId to nanos
FDBD joined cluster.
en1: assigned FE80::E402:F0FF:FEC3:967B
Your new fdb:7.1.19
package works great! (on a Linux x86_64 host, running locally).
My primary dev machine is an M1 Mac; that's where I see that segfault -- and I get the same segfault with your package as well, on the M1 mac -- though the 6.3.18-based image does seem to work on both Mac + Linux.
Thanks so much for bumping to the latest FDB version!
since you turned off accel explicitly (which you need to do on m1s) - what version of qemu are you running on the m1?
ops profile
or
qemu-system-x86_64 --version
should tell you
I have:
Ops version: 0.1.32
Nanos version: 0.0
Qemu version: 7.0.0
Arch: darwin
Virtualized: false
I think I figured out why it wasn't working -- the odd numbered fdb releases are compiled to use AVX instructions, whereas the even numbered ones are compiled without AVX instructions. So 7.1.18 works on my M1, and also would be more likely to work on most cloud instances.
so by default we do -cpu host and -cpu max and I'm pretty confident at least avx and avx2 work in those issues since there are many other applications that use it there
what's not clear to me atm is how it is implemented in qemu - it might require hardware acceleration cause it might not be present in TCG:
I would like to use the latest FoundationDB (7.x series) with Nanos, as this version has the new Redwood storage engine.
A quick try at using the existing fdb image (
eyberg/fdb:6.3.18
) didn't work for me (I tried supplying anfdb.cluster
file, but didn't get very far); I'm OK to keep experimenting with it, but perhaps an updated package with the latest release would be a good starting point.P.S. I love this project so much. I've deployed a few Python VMs to various clouds, and they are working great; the workflow with
ops
really nails it.