Closed vicuna closed 8 years ago
Comment author: Richard Jones
Apparently brasl %r14,0x20080995e22 is a branch to a subroutine. However gdb thinks that is an unmapped address:
(gdb) disassemble 0x20080995e22 No function contains specified address. (gdb) disassemble *0x20080995e22 Cannot access memory at address 0x20080995e22
That explains why it crashes, although not why it's jumping to a random place.
Comment author: Richard Jones
The function it is trying to call is probably caml_raise_exn, given other code in the main coqtop program like this:
80283d2c: c0 e5 00 38 90 7b brasl %r14,80995e22 <caml_raise_exn>
g_extraction.ml4 seems like it is part of a native dynlink library (confirmed by examining the strace output, showing that it is loading theories/extraction_plugin.cmxs). My suspicion would be that relocation of symbols in that library is going wrong somehow.
Comment author: Richard Jones
open("/home/rjones/d/fedora/coq/master/coq-8.5pl3/plugins/extraction/extraction_plugin.cmxs", O_RDONLY|O_CLOEXEC) = 7 read(7, "\177ELF\2\2\1\0\0\0\0\0\0\0\0\0\0\3\0\26\0\0\0\1\0\0\0\0\0\10$\330"..., 832) = 832 fstat(7, {st_mode=S_IFREG|0755, st_size=1443584, ...}) = 0 mmap(NULL, 1238464, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 7, 0) = 0x3ff6ca80000 mmap(0x3ff6cb8f000, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 7, 0x10e000) = 0x3ff6cb8f000 close(7) = 0 mprotect(0x3ff6ca80000, 1110016, PROT_READ|PROT_WRITE) = 0 mprotect(0x3ff6ca80000, 1110016, PROT_READ|PROT_EXEC) = 0 mprotect(0x3ff6cb8f000, 4096, PROT_READ) = 0 brk(NULL) = 0x8d7ac000 brk(0x8d894000) = 0x8d894000 --- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPC, si_addr=0x3ff80995e22} --- +++ killed by SIGILL (core dumped) +++
Comment author: Richard Jones
A simple reproducer is:
ocamlfind opt -g -package dynlink -linkpkg main.ml -o test ocamlfind opt -g -shared -o lib.cmxs lib.ml ./test
--------------- main.c ---- let () = Dynlink.init (); (try Dynlink.loadfile "lib.cmxs" with | Dynlink.Error err -> print_endline ("dynlink error: " ^ Dynlink.error_message err) | exn -> failwith ("unknown error loading plugin: " ^ Printexc.to_string exn) )
--------------- lib.c ---- let () = print_endline "loading the plugin ..."; raise Not_found
Note that it's debugging (-g) which causes the problem. If that is disabled then the test case works fine for me.
Comment author: Richard Jones
This patch seems to fix it:
diff --git a/asmcomp/s390x/emit.mlp b/asmcomp/s390x/emit.mlp index 5d233a3..a099bdb 100644 --- a/asmcomp/s390x/emit.mlp +++ b/asmcomp/s390x/emit.mlp @@ -611,7 +611,7 @@ let emit_instr i = | Lraise k -> begin match k with | Cmm.Raise_withtrace ->
brasl %r14, {emit_symbol "caml_raise_exn"}\n
;brasl %r14, {emit_symbol "caml_raise_exn"}@PLT\n
;
let lbl = record_frame Reg.Set.empty true i.dbg in
{emit_label lbl}:\n
| Cmm.Raise_notrace ->Comment author: Richard Jones
Applied to Fedora, and this fixes the Coq build.
Comment author: @gasche
Very nice! Would you be willing to submit a PR on github? If not I can take care of it -- reusing your commit.
Comment author: Richard Jones
Please submit one, I find the github workflow very tedious.
Comment author: @gasche
Done, thanks: https://github.com/ocaml/ocaml/pull/903
(Note: there are command-line tools to submit github pull-requests, see https://github.com/github/hub )
Comment author: @gasche
Xavier Leroy suggested a change to the patch, which is to use emit_call "caml_raise_exn";
instead. I updated my upstream patch proposal, the new version is available at:
https://patch-diff.githubusercontent.com/raw/ocaml/ocaml/pull/903.patch
Comment author: @xavierleroy
Seems to be fixed on trunk, see GPR 903. Should it be backported to 4.04.1?
Tentatively marking this MPR resolved. Yell if you disagree.
Comment author: Richard Jones
I do believe this should be backported to the 4.04 branch, for the benefit of Debian and other distros.
Comment author: @gasche
Fine with me (also un-invasive for other arches), I'll backport to 4.04.
Comment author: @gasche
4.04 commit: 7020a0b682524b4a63bdeb395db12b72389d01da
https://github.com/ocaml/ocaml/commit/7020a0b682524b4a63bdeb395db12b72389d01da
Original bug ID: 7405 Reporter: Richard Jones Status: resolved (set by @xavierleroy on 2016-11-11T18:47:57Z) Resolution: fixed Priority: normal Severity: crash Platform: s390x OS: Fedora OS Version: 26 Version: 4.04.0 Fixed in version: 4.05.0 +dev/beta1/beta2/beta3/rc1 Category: back end (clambda to assembly) Monitored by: @gasche
Bug description
bin/coqtop -boot -native-compiler -compile theories/Init/Prelude.v -noinit -R theories Coq Makefile.build:590: recipe for target 'theories/Init/Prelude.vo' failed make[1]: *** [theories/Init/Prelude.vo] Segmentation fault (core dumped)
I'm afraid I don't have very much detail, but I am able to get a core file.
Here is the stack trace according to gdb:
Core was generated by `bin/coqtop -boot -native-compiler -compile theories/Init/Prelude.v -noinit -R t'. Program terminated with signal SIGSEGV, Segmentation fault.
0 0x0000020080995e22 in ?? ()
[Current thread is 1 (Thread 0x2003db32870 (LWP 4231))] (gdb) bt
0 0x0000020080995e22 in ?? ()
1 0x000002005b1ccfdc in camlG_extraction__entry ()
2 0x000003ffd81f91c0 in ?? ()
PC not saved (gdb) frame 1
1 0x000002005b1ccfdc in camlG_extraction__entry ()
22 ARGUMENT EXTEND mlname
The disassembly is below. Current program counter is marked with "=>"
Dump of assembler code for function camlG_extractionentry: 0x000002005b1ccf30 <+0>: lay %r15,-40(%r15) 0x000002005b1ccf36 <+6>: stg %r14,32(%r15) 0x000002005b1ccf3c <+12>: lgrl %r3,0x2005b214fa0 0x000002005b1ccf42 <+18>: lgrl %r4,0x2005b2159a8 0x000002005b1ccf48 <+24>: stg %r3,0(%r4) 0x000002005b1ccf4e <+30>: brasl %r14,0x2005b1ccf7a <camlG_extraction__entry+74> 0x000002005b1ccf54 <+36>: lgrl %r9,0x2005b2152d0 0x000002005b1ccf5a <+42>: lg %r12,16(%r9) 0x000002005b1ccf60 <+48>: cgr %r2,%r12 0x000002005b1ccf64 <+52>: jgne 0x2005b1ccf74 <camlG_extractionentry+68> 0x000002005b1ccf6a <+58>: lghi %r6,1 0x000002005b1ccf6e <+62>: jg 0x2005b1cd00a <camlG_extractionentry+218> 0x000002005b1ccf74 <+68>: brasl %r14,0x20080995e22 0x000002005b1ccf7a <+74>: lay %r15,-16(%r15) 0x000002005b1ccf80 <+80>: stg %r14,0(%r15) 0x000002005b1ccf86 <+86>: stg %r13,8(%r15) 0x000002005b1ccf8c <+92>: lgr %r13,%r15 0x000002005b1ccf90 <+96>: lgrl %r5,0x2005b215140 0x000002005b1ccf96 <+102>: lg %r2,32(%r5) 0x000002005b1ccf9c <+108>: brasl %r14,0x2005b182198 <camlGenargdefault_empty_value_2063@plt> 0x000002005b1ccfa2 <+114>: cgfi %r2,1 0x000002005b1ccfa8 <+120>: jge 0x2005b1ccfba <camlG_extractionentry+138> 0x000002005b1ccfae <+126>: lg %r8,0(%r2) 0x000002005b1ccfb4 <+132>: jg 0x2005b1ccfdc <camlG_extraction__entry+172> 0x000002005b1ccfba <+138>: lgrl %r9,0x2005b215458 0x000002005b1ccfc0 <+144>: lghi %r12,0 0x000002005b1ccfc4 <+148>: sty %r12,0(%r9) 0x000002005b1ccfca <+154>: lgrl %r2,0x2005b2152d0 0x000002005b1ccfd0 <+160>: lg %r2,16(%r2) 0x000002005b1ccfd6 <+166>: brasl %r14,0x20080995e22 => 0x000002005b1ccfdc <+172>: lay %r11,-16(%r11) 0x000002005b1ccfe2 <+178>: clgr %r11,%r10 0x000002005b1ccfe6 <+182>: jgl 0x2005b1d09a2 <camlG_extractionentry+14962>
Steps to reproduce
Compile coq from source on s390x with OCaml 4.04.0
Additional information
Fedora has some patches on top of OCaml 4.04.0, but not many and I don't believe any of them are relevant to this bug. However you can find the patches here in case:
https://git.fedorahosted.org/cgit/fedora-ocaml.git/log/?h=fedora-26-4.04.0