nanovms / ops

ops - build and run nanos unikernels
https://ops.city
MIT License
1.3k stars 132 forks source link

Mysterious runtime errors: assertion res == copy_len failed and bitmap_dealloc error #1654

Closed julianbraha closed 2 months ago

julianbraha commented 3 months ago

Here's my system:

$     ops profile
Ops version: 0.1.42
Nanos version: 0.1.52
Qemu version: 4.2.1
OS: linux
Arch: amd64
Virtualized: true

$     cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.6 LTS"

$     uname -a
Linux julians-vm 5.15.0-1066-gcp #74~20.04.1-Ubuntu SMP Fri Jul 26 09:28:41 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

This is my config.json:

{
    "RunConfig": {
        "Memory": "48G"
    },
    "BaseVolumeSz": "32g"
}   

Here's the error I get with --nanos-version=0.1.51:

$ ops run -c config.json ./target/x86_64-unknown-linux-gnu/release/worker --nanos-version=0.1.51
running local instance
booting /home/julian/.ops/images/worker ...
en1: assigned 10.0.2.15
<jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
<jemalloc>: (This is the expected behaviour if you are running under QEMU)
en1: assigned FE80::6CCF:59FF:FE41:8333

frame trace:
ffffc00002e07f50:   ffffffff8d5505e2    (runloop_internal + 0000000000000242/00000000000009a7)
ffffc00002e07fc0:   ffffffff8d53baf3    (context_switch_finish + 0000000000000073/0000000000000211)

kernel load offset ffffffff8d2f5000

loaded klibs:
assertion res == copy_len failed at /home/eyberg/go/src/github.com/nanovms/nanos/src/kernel/pagecache.c:658 (IP 0xffffffff8d549caa) in pagecache_write_sg_finish()

And here's the error I get with --nanos-version=0.1.52:

$ ops run -c config.json ./target/x86_64-unknown-linux-gnu/release/worker --nanos-version=0.1.52 --force
running local instance
booting /home/julian/.ops/images/worker ...
en1: assigned 10.0.2.15
<jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
<jemalloc>: (This is the expected behaviour if you are running under QEMU)
en1: assigned FE80::CC2E:58FF:FE9C:E281
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 48966, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xc346 0xc347) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 48967, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xc347 0xc348) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 48968, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xc348 0xc349) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 48969, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xc349 0xc34a) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 48970, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xc34a 0xc34b) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 48971, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xc34b 0xc34c) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 48972, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xc34c 0xc34d) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 48973, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xc34d 0xc34e) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 48974, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xc34e 0xc34f) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 48975, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xc34f 0xc350) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 48976, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xc350 0xc351) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 48977, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xc351 0xc352) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 24817, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x64f1 0x64f2) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 24818, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x64f2 0x64f3) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 24819, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x64f3 0x64f4) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 24820, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x64f4 0x64f5) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 24821, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x64f5 0x64f6) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 24711, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x6487 0x6488) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 24712, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x6488 0x6489) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 24713, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x6489 0x648a) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 24714, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x648a 0x648b) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 24715, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x648b 0x648c) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 24716, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x648c 0x648d) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 24717, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x648d 0x648e) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 24718, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x648e 0x648f) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 24719, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x648f 0x6490) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 24720, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x6490 0x6491) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 24721, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x6491 0x6492) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 100861, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x18dfd 0x18dfe) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 100862, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x18dfe 0x18dff) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 100863, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x18dff 0x18e00) failed; leaking
id_dealloc error: failed, ra 0xffffffff869f3c46
thread 'main' panicked at alloc/src/raw_vec.rs:24:5:
capacity overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

*** signal 11 received by tid 2, errno 0, code 1
    fault address 0x32d60

*** Thread context:
lastvector: 000000000000000e (Page fault)
     frame: ffffc00002e04000
      type: thread
active_cpu: 00000000ffffffff
 stack top: 0000000000000000
error code: 0000000000000004
   address: 0000000000032d60

   rax: 0000000137601540
   rbx: 00000000fff046c0
   rcx: 00000000fff04980
   rdx: 0000000100000000
   rsi: 000000c1b6195cb0
   rdi: 0000000000032d60
   rbp: 000000013760dee0
   rsp: 00000000fff04610
    r8: 0000050fc9618a80
    r9: 0000050fc9618a80
   r10: 0000050fc9719080
   r11: 0000000001be4868
   r12: 0000000000000063
   r13: 0000000000000318
   r14: 0000050fc9729f80
   r15: 0000000001be49d8
   rip: 00000000008f850e
rflags: 0000000000000206
    ss: 000000000000002b
    cs: 0000000000000023
    ds: 0000000000000000
    es: 0000000000000000
fsbase: 0000000001be55c0
gsbase: 0000000000000000

frame trace:
000000013760dee8:   0001050fc972a001
0000050fc9729f89:   000000000101bc40

kernel load offset ffffffff8677d000

loaded klibs:

stack trace:
00000000fff04610:   0000050fc963bb80
00000000fff04618:   0000000000000040
00000000fff04620:   0000050fc9647280
00000000fff04628:   0000000000000033
00000000fff04630:   00000000fff04a90
00000000fff04638:   00000000fff04660
00000000fff04640:   0000000001be4928
00000000fff04648:   0000000001be4828
00000000fff04650:   0000000000000337
00000000fff04658:   00000000008f8d21
00000000fff04660:   0000050fc9618a80
00000000fff04668:   0000050fc9618a80
00000000fff04670:   0000050fc9618a80
00000000fff04678:   0000050fc9618a80
00000000fff04680:   0000050fc9618a80
00000000fff04688:   0000050fc9618a80
00000000fff04690:   0000050fc9719080
00000000fff04698:   0000050fc9729f80
00000000fff046a0:   0000000000032d60
00000000fff046a8:   000000c1b61955d8
00000000fff046b0:   0000000137601540
00000000fff046b8:   0000000137601540
00000000fff046c0:   0000000137601540
00000000fff046c8:   0000000137601540
00000000fff046d0:   0000000137601540
00000000fff046d8:   0000000137601540
00000000fff046e0:   0000000137601540
00000000fff046e8:   0000000137601540
00000000fff046f0:   0000000137601540
00000000fff046f8:   0000000137601540
00000000fff04700:   0000000137601540
00000000fff04708:   0000000137601540

   core dump
francescolavra commented 3 months ago

From the above error logs I'm unable to get to the root cause. @julianbraha could you send us the application you are trying to run that produces those errors? Having the source code would be ideal, but I think having the binary file only would work too.

julianbraha commented 3 months ago

From the above error logs I'm unable to get to the root cause. @julianbraha could you send us the application you are trying to run that produces those errors? Having the source code would be ideal, but I think having the binary file only would work too.

Here's the github repo: https://github.com/0xPolygonZero/zk_evm And I'm specifically trying to run the worker bin. Compiling it like this:

CC=/usr/bin/gcc-10 RUSTFLAGS="-g -C target-feature=+crt-static  -C target-feature=-avx512f,-avx512dq,-avx512cd,-avx512bw,-avx512vl,-avx512ifma" cargo build --bin worker --target x86_64-unknown-linux-gnu --release
rinor commented 3 months ago

@julianbraha can you try with nightly nanos with transparent hugepages disabled

{
    "ManifestPassthrough": {
        "transparent_hugepage": "never"
    },
    "RunConfig": {
        "Memory": "48G"
    },
    "BaseVolumeSz": "32g"
} 
julianbraha commented 2 months ago

I'm getting a very similar (the same?) error:

$ ../ops/ops run -c config.json ./target/x86_64-unknown-linux-gnu/release/worker --force
running local instance
booting /home/julian/.ops/images/worker ...
en1: assigned 10.0.2.15
<jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
<jemalloc>: (This is the expected behaviour if you are running under QEMU)
en1: assigned FE80::445C:D4FF:FED3:D9
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59081, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeb49 0xeb4a) failed; leaking
id_dealloc error: failed, ra 0xffffffffc683bc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59082, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeb4a 0xeb4b) failed; leaking
id_dealloc error: failed, ra 0xffffffffc683bc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59083, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeb4b 0xeb4c) failed; leaking
id_dealloc error: failed, ra 0xffffffffc683bc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59084, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeb4c 0xeb4d) failed; leaking
id_dealloc error: failed, ra 0xffffffffc683bc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59085, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeb4d 0xeb4e) failed; leaking
id_dealloc error: failed, ra 0xffffffffc683bc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59086, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeb4e 0xeb4f) failed; leaking
id_dealloc error: failed, ra 0xffffffffc683bc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59087, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeb4f 0xeb50) failed; leaking
id_dealloc error: failed, ra 0xffffffffc683bc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59088, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeb50 0xeb51) failed; leaking
id_dealloc error: failed, ra 0xffffffffc683bc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59089, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeb51 0xeb52) failed; leaking
id_dealloc error: failed, ra 0xffffffffc683bc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59090, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeb52 0xeb53) failed; leaking
id_dealloc error: failed, ra 0xffffffffc683bc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59091, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeb53 0xeb54) failed; leaking
id_dealloc error: failed, ra 0xffffffffc683bc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59092, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeb54 0xeb55) failed; leaking
id_dealloc error: failed, ra 0xffffffffc683bc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59093, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeb55 0xeb56) failed; leaking
id_dealloc error: failed, ra 0xffffffffc683bc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59094, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeb56 0xeb57) failed; leaking
id_dealloc error: failed, ra 0xffffffffc683bc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 23334, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x5fa6 0x5fa7) failed; leaking
id_dealloc error: failed, ra 0xffffffffc683bc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 23335, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x5fa7 0x5fa8) failed; leaking
id_dealloc error: failed, ra 0xffffffffc683bc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 23336, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x5fa8 0x5fa9) failed; leaking
id_dealloc error: failed, ra 0xffffffffc683bc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 23337, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x5fa9 0x5faa) failed; leaking
id_dealloc error: failed, ra 0xffffffffc683bc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 23338, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x5faa 0x5fab) failed; leaking
id_dealloc error: failed, ra 0xffffffffc683bc46
thread 'main' panicked at alloc/src/raw_vec.rs:24:5:
capacity overflow
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

*** signal 11 received by tid 2, errno 0, code 1
    fault address 0x34e10

*** Thread context:
lastvector: 000000000000000e (Page fault)
     frame: ffffc00002e04000
      type: thread
active_cpu: 00000000ffffffff
 stack top: 0000000000000000
error code: 0000000000000004
   address: 0000000000034e10

   rax: 0000000000034e10
   rbx: ee398098c0000000
   rcx: ee398098c69c2a25
   rdx: 0000000000000070
   rsi: 0000000001b5cad8
   rdi: 0000000001b5ca08
   rbp: 0000000001b5c828
   rsp: 00000000ffe66370
    r8: 0000000000000001
    r9: 0000000000000001
   r10: 00000000018dd390
   r11: 0000000000000000
   r12: 3000000000000000
   r13: 0000000000034e10
   r14: 0000000000000001
   r15: 0000000001b5c858
   rip: 00000000007706e2
rflags: 0000000000000202
    ss: 000000000000002b
    cs: 0000000000000023
    ds: 0000000000000000
    es: 0000000000000000
fsbase: 0000000001b5d5c0
gsbase: 0000000000000000

frame trace:
0000000001b5c830:   00000000289e42c0

kernel load offset ffffffffc65c5000

loaded klibs:

stack trace:
00000000ffe66370:   0000000001b5cb78
00000000ffe66378:   c7c445a70aee8b00
00000000ffe66380:   0000000001b5c800
00000000ffe66388:   0000000001b5cb70
00000000ffe66390:   0000000001b5c840
00000000ffe66398:   0000000001b5c848
00000000ffe663a0:   0000000001b5cb78
00000000ffe663a8:   0000000000702790
00000000ffe663b0:   0000000000000002
00000000ffe663b8:   0000000000a4b95b
00000000ffe663c0:   0000000000000000
00000000ffe663c8:   c7c445a70aee8b00
00000000ffe663d0:   00000000ffe66418
00000000ffe663d8:   00000000ffe67918
00000000ffe663e0:   0000000000000000
00000000ffe663e8:   0000000000702790
00000000ffe663f0:   0000000000000000
00000000ffe663f8:   00000000ffe66418
00000000ffe66400:   00000000ffe66430
00000000ffe66408:   0000000000ac07cb
00000000ffe66410:   0000000102a7be10
00000000ffe66418:   000000010aa4b000
00000000ffe66420:   0000000000000000
00000000ffe66428:   0000000000000003
00000000ffe66430:   0000000000000001
00000000ffe66438:   000000010aa4b000
00000000ffe66440:   0000000000000000
00000000ffe66448:   0000000000000004
00000000ffe66450:   0000000000000001
00000000ffe66458:   0000000000000000
00000000ffe66460:   0000000105e22200
00000000ffe66468:   0000000000000005

   core dump
julian@julians-curiosity:~/zk_evm_nanos$ cat config.json
{
"NanosVersion": "nightly",
"ManifestPassthrough": {
"transparent_hugepage": "never"
},
"RunConfig": {
"Memory": "56G"
},
"BaseVolumeSz": "32g"
}
julian@julians-curiosity:~/zk_evm_nanos$ nano config.json
julian@julians-curiosity:~/zk_evm_nanos$ ../ops/ops run -c config.json ./target/x86_64-unknown-linux-gnu/release/worker --force
running local instance
booting /home/julian/.ops/images/worker ...
en1: assigned 10.0.2.15
<jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
<jemalloc>: (This is the expected behaviour if you are running under QEMU)
en1: assigned FE80::988B:74FF:FE52:2BBB
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59119, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeaef 0xeaf0) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59120, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeaf0 0xeaf1) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59121, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeaf1 0xeaf2) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59122, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeaf2 0xeaf3) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59123, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeaf3 0xeaf4) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59124, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeaf4 0xeaf5) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59125, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeaf5 0xeaf6) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59126, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeaf6 0xeaf7) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59127, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xeaf7 0xeaf8) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59386, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xebfa 0xebfb) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59387, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xebfb 0xebfc) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59388, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xebfc 0xebfd) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59389, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xebfd 0xebfe) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59390, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xebfe 0xebff) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 59391, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0xebff 0xec00) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 25909, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x6935 0x6936) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 25910, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x6936 0x6937) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 25912, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x6938 0x6939) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 25913, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x6939 0x693a) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 25914, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x693a 0x693b) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 25915, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x693b 0x693c) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 25916, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x693c 0x693d) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46
bitmap_dealloc error: bitmap 0xffff8000000001c8, bit 25917, order 0: not allocated in map; leaking
dealloc_from_range error: heap 0xffff800000000000: bitmap dealloc for range [0x693d 0x693e) failed; leaking
id_dealloc error: failed, ra 0xfffffffffd2adc46

*** signal 11 received by tid 4, errno 0, code 1
    fault address 0x1fbc

*** Thread context:
lastvector: 000000000000000e (Page fault)
     frame: ffffc00002e05000
      type: thread
active_cpu: 00000000ffffffff
 stack top: 0000000000000000
error code: 0000000000000004
   address: 0000000000001fbc

   rax: 00000001004cc0e8
   rbx: 00000001004cc120
   rcx: 000000015bf37c10
   rdx: 0000000000000000
   rsi: 0000000000001fbc
   rdi: 0000000000000000
   rbp: 000000015bf37c00
   rsp: 00000001004cc080
    r8: 00000001004cdca8
    r9: 00000001004cd970
   r10: 0000000000000000
   r11: 00000001004cde00
   r12: 0000000000000028
   r13: 00000001004cc0f0
   r14: 0000000100ac9a00
   r15: 000000015bf37fc0
   rip: 0000000000a07b1f
rflags: 0000000000000297
    ss: 000000000000002b
    cs: 0000000000000023
    ds: 0000000000000000
    es: 0000000000000000
fsbase: 00000001004ce700
gsbase: 0000000000000000

frame trace:
000000015bf37c08:   0000000000001fbc

kernel load offset fffffffffd037000

loaded klibs:

stack trace:
00000001004cc080:   0000000000000140
00000001004cc088:   00000001004cc3d0
00000001004cc090:   000000010e0b7210
00000001004cc098:   0000000000000000
00000001004cc0a0:   0000000000000000
00000001004cc0a8:   00000000000003f7
00000001004cc0b0:   00000000ffefd900
00000001004cc0b8:   00000000008e0c98
00000001004cc0c0:   00000001004cc178
00000001004cc0c8:   000000010e0b7200
00000001004cc0d0:   00000001004cc1b8
00000001004cc0d8:   00000000000003f7
00000001004cc0e0:   00000000000003f7
00000001004cc0e8:   0000000000000000
00000001004cc0f0:   000000015bf37c00
00000001004cc0f8:   000000015bf37fc0
00000001004cc100:   00000001004cc0e8
00000001004cc108:   000000011bd47d20
00000001004cc110:   0000000000000001
00000001004cc118:   0000000000000001
00000001004cc120:   00000001004cc330
00000001004cc128:   00000001004cc338
00000001004cc130:   00000001004cc340
00000001004cc138:   00000001004cc3d0
00000001004cc140:   00000001004cc350
00000001004cc148:   00000001004cc4a0
00000001004cc150:   00000000000001fb
00000001004cc158:   00000000000003f7
00000001004cc160:   0000000000000002
00000001004cc168:   0000000000a50403
00000001004cc170:   0000000000001000
00000001004cc178:   000000010e0b7200

   core dump

And here's the config.json:

$ cat config.json
{
"NanosVersion": "nightly",
"ManifestPassthrough": {
"transparent_hugepage": "never"
},
"RunConfig": {
"Memory": "48G"
},
"BaseVolumeSz": "32g"
}

@rinor Also, I want to check that this is the correct way to set Nanos to nightly "NanosVersion": "nightly",

julianbraha commented 2 months ago

Wait - I added this line to config.json: "NightlyBuild": true, So now it looks like:

$ cat config.json
{
"NanosVersion": "nightly",
"NightlyBuild": true,
"ManifestPassthrough": {
"transparent_hugepage": "never"
},
"RunConfig": {
"Memory": "48G"
},
"BaseVolumeSz": "32g"
}

And now this is the error:

$ ../ops/ops run -c config.json ./target/x86_64-unknown-linux-gnu/release/worker --force
 100% |████████████████████████████████████████|
running local instance
booting /home/julian/.ops/images/worker ...
en1: assigned 10.0.2.15
<jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
<jemalloc>: (This is the expected behaviour if you are running under QEMU)
en1: assigned FE80::B0DA:10FF:FE79:E237

frame trace:
ffffc0000100ff50:   ffffffff90348743    (runloop_internal + 0000000000000243/0000000000000c84)
ffffc0000100ffc0:   ffffffff903320fa    (context_switch_finish + 000000000000007a/0000000000000224)

kernel load offset ffffffff900eb000

loaded klibs:
assertion res == copy_len failed at /home/circleci/project/src/kernel/pagecache.c:658 (IP 0xffffffff90342a63) in pagecache_write_sg_finish()
francescolavra commented 2 months ago

OK, those "bitmap_dealloc" errors are caused by a bug in the kernel implementation of transparent huge pages, which is fixed in https://github.com/nanovms/nanos/pull/2052. You can try running the worker with the fixed kernel by using the Ops command line option --nanos-version=a5ffd6a (remove the "NanosVersion" and "NightlyBuild" properties from the config file, and you can also remove the "transparent_hugepage" property, which is not needed with this fix). As for the other error ("assertion res == copy_len failed"), I have been unable to reproduce it; I tried running the worker built from the develop branch, as well as from the latest release v0.6.0, and in both cases it seems to run fine: I let it run for a few minutes and didn't see any error, I just noticed it uses a lot of CPU but this should be normal since it happens also when running the program natively on the host machine. Do you see that "assertion res == copy_len failed" error soon after booting, or after some time that the program is running?

julianbraha commented 2 months ago

Do you see that "assertion res == copy_len failed" error soon after booting, or after some time that the program is running?

It's after awhile, maybe about 10-20 minutes

julianbraha commented 2 months ago

You can try running the worker with the fixed kernel by using the Ops command line option --nanos-version=a5ffd6a (remove the "NanosVersion" and "NightlyBuild" properties from the config file, and you can also remove the "transparent_hugepage" property, which is not needed with this fix).

$ ../ops/ops run -c config.json ./target/x86_64-unknown-linux-gnu/release/worker --force --nanos-version=a5ffd6a
 100% |████████████████████████████████████████|
running local instance
booting /home/julian/.ops/images/worker ...
en1: assigned 10.0.2.15
<jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
<jemalloc>: (This is the expected behaviour if you are running under QEMU)
en1: assigned FE80::E46F:55FF:FED8:7F02

frame trace:
ffffc00000e0ff50:   ffffffff914d6743    (runloop_internal + 0000000000000243/0000000000000c84)
ffffc00000e0ffc0:   ffffffff914c00fa    (context_switch_finish + 000000000000007a/0000000000000224)

kernel load offset ffffffff91279000

loaded klibs:
assertion res == copy_len failed at /home/circleci/project/src/kernel/pagecache.c:658 (IP 0xffffffff914d0a63) in pagecache_write_sg_finish()

Still getting this assertion res == copy_len failed even on --nanos-version=a5ffd6a, with config.json:

$ cat config.json
{
"RunConfig": {
"Memory": "48G"
},
"BaseVolumeSz": "32g"
}

The hardware is a GCP n2-standard-16 compute engine VM. It ran for 35 minutes before reaching the assertion res == copy_len failed

francescolavra commented 2 months ago

https://github.com/nanovms/nanos/pull/2053 fixes the assertion res == copy_len error. You can run the fixed kernel with --nanos-version=328ec5c. Unrelated note: in my setup, the worker application terminates after a while with the following messages:

thread 'main' panicked at /home/francesco/.cargo/registry/src/index.crates.io-6f17d22bba15001f/paladin-core-0.4.2/src/runtime/dynamic_channel.rs:132:51:
amqp_uri is required
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I guess it's a matter of setting up correctly the program arguments (--amqp-uri) and/or the environment variables (AMQP_URI).

julianbraha commented 2 months ago

Awesome, yeah it works for me now with --nanos-version=328ec5c.

Thanks for the support! Closing this.