Closed julianbraha closed 3 months ago
can you paste the output of ?
ops profile
cat /etc/lsb-release
uname -a
can you paste the output of ?
* `ops profile` * `cat /etc/lsb-release` * `uname -a`
$ ops profile
Ops version: 0.1.42
Nanos version: 0.1.51
Qemu version: 9.0.2
OS: linux
Arch: amd64
Virtualized: false
$ cat /etc/lsb-release
DISTRIB_ID=cachyos
DISTRIB_RELEASE="rolling"
DISTRIB_DESCRIPTION="CachyOS"
$ uname -a
Linux framework-laptop 6.10.3-3-cachyos #1 SMP PREEMPT_DYNAMIC Sun, 04 Aug 2024 09:34:45 +0000 x86_64 GNU/Linux
looking at cachyos benefits/optimizations, you are more than likely trying to execute instructions that aren't being found; to figure out which one it is you can get a coredump as shown in https://docs.ops.city/ops/hypervisors/debugging#core-dumps
(note: i had to use ops run -c config.json main --nanos-version=0.1.47
, which is a sep. issue)
running via gdb you should see the offending instruction it is hitting (perhaps avx512 related)
from there, you can disable the feature at compile-time https://doc.rust-lang.org/rustc/codegen-options/index.html#target-feature or we could look at it to see if it's something we could toggle on where appropriate
running via gdb you should see the offending instruction it is hitting (perhaps avx512 related)
Hmmm not sure how to interpret the output from GDB here. This is what I got:
$ rustc main.rs -o main
$ ops run -c config.json main --nanos-version=0.1.47
running local instance
booting /home/julian/.ops/images/main ...
en1: assigned 10.0.2.15
signal 4 (core dumped)
exit status 9
$ ops image cp main coredumps/core .
$ gdb -ex bt -ex quit main core
GNU gdb (GDB) 15.1
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from main...
[New LWP 2]
Core was generated by `main'.
Program terminated with signal SIGILL, Illegal instruction.
#0 0x0000014f6c0265ad in ?? ()
#0 0x0000014f6c0265ad in ?? ()
#1 0x0000000000000000 in ?? ()
Since your executable is dynamically linked, by default Nanos applies address space layout randomization to it, that's why you can't map the addresses in the backtrace to program symbols. To disable randomization, you can add a "noaslr": "t"
attribute to the "ManifestPassthrough" JSON object in your config.json file, as in:
"ManifestPassthrough": {
"coredumplimit": "150m",
"noaslr": "t"
}
and the re-run Ops, copy the core dump file to the host, and open the file again with gdb; this time, you should be able to see the program symbols in the backtrace. In order to pinpoint the exact instruction that caused the fault, you can type disas /s *0x0000014f6c0265ad
(replace the above number with the actual address in the first line of your backtrace) in the gdb prompt and see what instruction is at that address.
you can add a
"noaslr": "t"
attribute to the "ManifestPassthrough" JSON object in your config.json file
Tried this, but it didn't seem to change anything:
$ cat config.json
{
"BaseVolumeSz": "200m",
"ManifestPassthrough": {
"coredumplimit": "150m",
"noaslr": "t"
}
}
$ ops run -c config.json main --nanos-version=0.1.47
running local instance
booting /home/julian/.ops/images/main ...
en1: assigned 10.0.2.15
signal 4 (core dumped)
$ ops image cp main coredumps/core .
$ gdb -ex bt -ex quit main core
GNU gdb (GDB) 15.1
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from main...
[New LWP 2]
Core was generated by `main'.
Program terminated with signal SIGILL, Illegal instruction.
#0 0x00000001000255ad in ?? ()
#0 0x00000001000255ad in ?? ()
#1 0x0000000000000000 in ?? ()
Oops, I forgot that Nanos uses a static offset of 0x400000 when ASLR is disabled, so to get to the faulting instruction you have to subtract 0x400000 from the addresses in the backtrace. In your case, the command to type at the gdb prompt would be disas /s *0xffc255ad
Okay, it looks like the problematic instruction is vmovdqu8
, which sure enough, is avx512.
I think this must have something to do with the system libraries, because when I try to compile the rust binary for a target without avx512 (e.g. nehalem), it's still present:
rustc -C target-cpu=nehalem -C target-feature=+crt-static main.rs -o main
and again, in gdb:
#0 0x0000000000472c81 in _dl_aux_init ()
#0 0x0000000000472c81 in _dl_aux_init ()
#1 0x0000000000447f40 in __libc_start_main_impl ()
#2 0x00000000004104c5 in _start ()
(gdb) disas /s 0x0000000000472c81
Dump of assembler code for function _dl_aux_init:
0x0000000000472c60 <+0>: endbr64
0x0000000000472c64 <+4>: push %rbp
0x0000000000472c65 <+5>: vpxor %xmm0,%xmm0,%xmm0
0x0000000000472c69 <+9>: lea -0x627d0(%rip),%rax # 0x4104a0 <_start>
0x0000000000472c70 <+16>: mov %rsp,%rbp
0x0000000000472c73 <+19>: sub $0x1a0,%rsp
0x0000000000472c7a <+26>: mov %rdi,0xe00a7(%rip) # 0x552d28 <_dl_auxv>
=> 0x0000000000472c81 <+33>: vmovdqu8 %zmm0,0x40(%rsp)
I installed the x86_64-unknown-linux-musl
target in rustup, and after compiling with:
rustc --target=x86_64-unknown-linux-musl main.rs -o main
it works!
Thanks for your help everyone. Closing.
I'm trying to test out ops on the most basic Rust example, but it crashes.
Here's my
main.rs
:Which I compiled with
rustc main.rs -o main
And then when I ran
ops run main
:Here are my package versions: