ocaml / ocaml

The core OCaml system: compilers, runtime system, base libraries
https://ocaml.org
Other
5.47k stars 1.11k forks source link

Illegal instruction (core dumped) when running coqtop on s390x #7405

Closed vicuna closed 8 years ago

vicuna commented 8 years ago

Original bug ID: 7405 Reporter: Richard Jones Status: resolved (set by @xavierleroy on 2016-11-11T18:47:57Z) Resolution: fixed Priority: normal Severity: crash Platform: s390x OS: Fedora OS Version: 26 Version: 4.04.0 Fixed in version: 4.05.0 +dev/beta1/beta2/beta3/rc1 Category: back end (clambda to assembly) Monitored by: @gasche

Bug description

bin/coqtop -boot -native-compiler -compile theories/Init/Prelude.v -noinit -R theories Coq Makefile.build:590: recipe for target 'theories/Init/Prelude.vo' failed make[1]: *** [theories/Init/Prelude.vo] Segmentation fault (core dumped)

I'm afraid I don't have very much detail, but I am able to get a core file.

Here is the stack trace according to gdb:

Core was generated by `bin/coqtop -boot -native-compiler -compile theories/Init/Prelude.v -noinit -R t'. Program terminated with signal SIGSEGV, Segmentation fault.

0 0x0000020080995e22 in ?? ()

[Current thread is 1 (Thread 0x2003db32870 (LWP 4231))] (gdb) bt

0 0x0000020080995e22 in ?? ()

1 0x000002005b1ccfdc in camlG_extraction__entry ()

at plugins/extraction/g_extraction.ml4:22

2 0x000003ffd81f91c0 in ?? ()

PC not saved (gdb) frame 1

1 0x000002005b1ccfdc in camlG_extraction__entry ()

at plugins/extraction/g_extraction.ml4:22

22 ARGUMENT EXTEND mlname

The disassembly is below. Current program counter is marked with "=>"

Dump of assembler code for function camlG_extractionentry: 0x000002005b1ccf30 <+0>: lay %r15,-40(%r15) 0x000002005b1ccf36 <+6>: stg %r14,32(%r15) 0x000002005b1ccf3c <+12>: lgrl %r3,0x2005b214fa0 0x000002005b1ccf42 <+18>: lgrl %r4,0x2005b2159a8 0x000002005b1ccf48 <+24>: stg %r3,0(%r4) 0x000002005b1ccf4e <+30>: brasl %r14,0x2005b1ccf7a <camlG_extraction__entry+74> 0x000002005b1ccf54 <+36>: lgrl %r9,0x2005b2152d0 0x000002005b1ccf5a <+42>: lg %r12,16(%r9) 0x000002005b1ccf60 <+48>: cgr %r2,%r12 0x000002005b1ccf64 <+52>: jgne 0x2005b1ccf74 <camlG_extractionentry+68> 0x000002005b1ccf6a <+58>: lghi %r6,1 0x000002005b1ccf6e <+62>: jg 0x2005b1cd00a <camlG_extractionentry+218> 0x000002005b1ccf74 <+68>: brasl %r14,0x20080995e22 0x000002005b1ccf7a <+74>: lay %r15,-16(%r15) 0x000002005b1ccf80 <+80>: stg %r14,0(%r15) 0x000002005b1ccf86 <+86>: stg %r13,8(%r15) 0x000002005b1ccf8c <+92>: lgr %r13,%r15 0x000002005b1ccf90 <+96>: lgrl %r5,0x2005b215140 0x000002005b1ccf96 <+102>: lg %r2,32(%r5) 0x000002005b1ccf9c <+108>: brasl %r14,0x2005b182198 <camlGenargdefault_empty_value_2063@plt> 0x000002005b1ccfa2 <+114>: cgfi %r2,1 0x000002005b1ccfa8 <+120>: jge 0x2005b1ccfba <camlG_extractionentry+138> 0x000002005b1ccfae <+126>: lg %r8,0(%r2) 0x000002005b1ccfb4 <+132>: jg 0x2005b1ccfdc <camlG_extraction__entry+172> 0x000002005b1ccfba <+138>: lgrl %r9,0x2005b215458 0x000002005b1ccfc0 <+144>: lghi %r12,0 0x000002005b1ccfc4 <+148>: sty %r12,0(%r9) 0x000002005b1ccfca <+154>: lgrl %r2,0x2005b2152d0 0x000002005b1ccfd0 <+160>: lg %r2,16(%r2) 0x000002005b1ccfd6 <+166>: brasl %r14,0x20080995e22 => 0x000002005b1ccfdc <+172>: lay %r11,-16(%r11) 0x000002005b1ccfe2 <+178>: clgr %r11,%r10 0x000002005b1ccfe6 <+182>: jgl 0x2005b1d09a2 <camlG_extractionentry+14962>

Steps to reproduce

Compile coq from source on s390x with OCaml 4.04.0

Additional information

Fedora has some patches on top of OCaml 4.04.0, but not many and I don't believe any of them are relevant to this bug. However you can find the patches here in case:

https://git.fedorahosted.org/cgit/fedora-ocaml.git/log/?h=fedora-26-4.04.0

vicuna commented 8 years ago

Comment author: Richard Jones

Apparently brasl %r14,0x20080995e22 is a branch to a subroutine. However gdb thinks that is an unmapped address:

(gdb) disassemble 0x20080995e22 No function contains specified address. (gdb) disassemble *0x20080995e22 Cannot access memory at address 0x20080995e22

That explains why it crashes, although not why it's jumping to a random place.

vicuna commented 8 years ago

Comment author: Richard Jones

The function it is trying to call is probably caml_raise_exn, given other code in the main coqtop program like this:

80283d2c:   c0 e5 00 38 90 7b       brasl   %r14,80995e22 <caml_raise_exn>

g_extraction.ml4 seems like it is part of a native dynlink library (confirmed by examining the strace output, showing that it is loading theories/extraction_plugin.cmxs). My suspicion would be that relocation of symbols in that library is going wrong somehow.

vicuna commented 8 years ago

Comment author: Richard Jones

open("/home/rjones/d/fedora/coq/master/coq-8.5pl3/plugins/extraction/extraction_plugin.cmxs", O_RDONLY|O_CLOEXEC) = 7 read(7, "\177ELF\2\2\1\0\0\0\0\0\0\0\0\0\0\3\0\26\0\0\0\1\0\0\0\0\0\10$\330"..., 832) = 832 fstat(7, {st_mode=S_IFREG|0755, st_size=1443584, ...}) = 0 mmap(NULL, 1238464, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 7, 0) = 0x3ff6ca80000 mmap(0x3ff6cb8f000, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 7, 0x10e000) = 0x3ff6cb8f000 close(7) = 0 mprotect(0x3ff6ca80000, 1110016, PROT_READ|PROT_WRITE) = 0 mprotect(0x3ff6ca80000, 1110016, PROT_READ|PROT_EXEC) = 0 mprotect(0x3ff6cb8f000, 4096, PROT_READ) = 0 brk(NULL) = 0x8d7ac000 brk(0x8d894000) = 0x8d894000 --- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPC, si_addr=0x3ff80995e22} --- +++ killed by SIGILL (core dumped) +++

vicuna commented 8 years ago

Comment author: Richard Jones

A simple reproducer is:

ocamlfind opt -g -package dynlink -linkpkg main.ml -o test ocamlfind opt -g -shared -o lib.cmxs lib.ml ./test

--------------- main.c ---- let () = Dynlink.init (); (try Dynlink.loadfile "lib.cmxs" with | Dynlink.Error err -> print_endline ("dynlink error: " ^ Dynlink.error_message err) | exn -> failwith ("unknown error loading plugin: " ^ Printexc.to_string exn) )

--------------- lib.c ---- let () = print_endline "loading the plugin ..."; raise Not_found

Note that it's debugging (-g) which causes the problem. If that is disabled then the test case works fine for me.

vicuna commented 8 years ago

Comment author: Richard Jones

This patch seems to fix it:

diff --git a/asmcomp/s390x/emit.mlp b/asmcomp/s390x/emit.mlp index 5d233a3..a099bdb 100644 --- a/asmcomp/s390x/emit.mlp +++ b/asmcomp/s390x/emit.mlp @@ -611,7 +611,7 @@ let emit_instr i = | Lraise k -> begin match k with | Cmm.Raise_withtrace ->

vicuna commented 8 years ago

Comment author: Richard Jones

Applied to Fedora, and this fixes the Coq build.

https://git.fedorahosted.org/cgit/fedora-ocaml.git/commit/?h=fedora-26-4.04.0&id=e732c39340e86939530a087744caa8d8f1247878

vicuna commented 8 years ago

Comment author: @gasche

Very nice! Would you be willing to submit a PR on github? If not I can take care of it -- reusing your commit.

vicuna commented 8 years ago

Comment author: Richard Jones

Please submit one, I find the github workflow very tedious.

vicuna commented 8 years ago

Comment author: @gasche

Done, thanks: https://github.com/ocaml/ocaml/pull/903

(Note: there are command-line tools to submit github pull-requests, see https://github.com/github/hub )

vicuna commented 8 years ago

Comment author: @gasche

Xavier Leroy suggested a change to the patch, which is to use emit_call "caml_raise_exn"; instead. I updated my upstream patch proposal, the new version is available at:

https://patch-diff.githubusercontent.com/raw/ocaml/ocaml/pull/903.patch

vicuna commented 8 years ago

Comment author: @xavierleroy

Seems to be fixed on trunk, see GPR 903. Should it be backported to 4.04.1?

Tentatively marking this MPR resolved. Yell if you disagree.

vicuna commented 8 years ago

Comment author: Richard Jones

I do believe this should be backported to the 4.04 branch, for the benefit of Debian and other distros.

vicuna commented 8 years ago

Comment author: @gasche

Fine with me (also un-invasive for other arches), I'll backport to 4.04.

vicuna commented 7 years ago

Comment author: @gasche

4.04 commit: 7020a0b682524b4a63bdeb395db12b72389d01da

https://github.com/ocaml/ocaml/commit/7020a0b682524b4a63bdeb395db12b72389d01da