ocaml / dune

A composable build system for OCaml.
https://dune.build/
MIT License
1.63k stars 403 forks source link

linking error compiling a specific bytecode exectable #3240

Closed mseri closed 4 years ago

mseri commented 4 years ago

Expected Behavior

dune build builds everything and does not fail.

Actual Behavior

The compilation one one of the artifacts fails with

dune build
      ocamlc examples/van_der_pol_odepack.bc (exit 2)
(cd _build/default && HOME/.opam/4.10.0/bin/ocamlc.opt -w @1..3@5..28@30..39@43@46..47@49..57@61..62-40 -strict-sequence -strict-formats -short-paths -keep-locs -g -o examples/van_der_pol_odepack.bc HOME/.opam/4.10.0/lib/ocaml/unix.cma HOME/.opam/4.10.0/lib/ocaml/bigarray.cma HOME/.opam/4.10.0/lib/integers/integers.cma HOME/.opam/4.10.0/lib/ctypes/ctypes.cma HOME/.opam/4.10.0/lib/ocaml/str.cma HOME/.opam/4.10.0/lib/ctypes/cstubs.cma HOME/.opam/4.10.0/lib/eigen/cpp/eigen_cpp_stubs.cma HOME/.opam/4.10.0/lib/eigen/eigen.cma HOME/.opam/4.10.0/lib/stdlib-shims/stdlib_shims.cma HOME/.opam/4.10.0/lib/owl-base/owl_base.cma HOME/.opam/4.10.0/lib/zip/zip.cma HOME/.opam/4.10.0/lib/npy/npy.cma HOME/.opam/4.10.0/lib/owl/owl.cma src/base/owl_ode_base.cma src/ode/owl_ode.cma HOME/.opam/4.10.0/lib/odepack/fortran/odepack_fortran.cma HOME/.opam/4.10.0/lib/odepack/odepack.cma src/odepack/owl_ode_odepack.cma HOME/.opam/4.10.0/lib/plplot/plplot.cma HOME/.opam/4.10.0/lib/owl-plplot/owl_plplot.cma examples/.van_der_pol_odepack.eobjs/byte/dune__exe__Van_der_pol_odepack.cmo)
ld: library not found for -lplplot_stubs
clang: error: linker command failed with exit code 1 (use -v to see invocation)
File "_none_", line 1:
Error: Error while building custom runtime system

Reproduction

  1. Create a new ocaml 4.10.0 switch
  2. Pin the dune port of plplot available here: https://github.com/mseri/ocaml-plplot/tree/port-to-dune I could reproduce without this, it seems to be a difference in behaviour between ocaml 4.10 and previous versions
  3. opam install -t --deps-only owl-ode-sundials owl-ode-odepack
  4. git clone https://github.com/owlbarn/owl_ode
  5. cd owl_ode && git checkout v0.2.0
  6. dune build fails with the error shown above.

Specifications

For some reasons, this does not happen if I do the same on ocaml 4.09.0. In that case I need to explicitly call dune exec examples/van_der_pol_odepack.bc to see the failure

ghost commented 4 years ago

I'm trying to reproduce the bug but I'm having issues installing some of the external dependencies.

For some reasons, this does not happen if I do the same on ocaml 4.09.0. In that case I need to explicitly call dune exec examples/van_der_pol_odepack.bc to see the failure

That's very odd. Just to confirm, with 4.09.0:

$ dune build examples/van_der_pol_odepack.bc # it works!
$ dune exec examples/van_der_pol_odepack.bc
dune build
      ocamlc examples/van_der_pol_odepack.bc (exit 2)
(cd _build/default && HOME/.opam/4.10.0/bin/ocamlc.opt -w @1..3@5..28@30..39@43@46..47@49..57@61..62-40 -strict-sequence -strict-formats -short-paths -keep-locs -g -o examples/van_der_pol_odepack.bc HOME/.opam/4.10.0/lib/ocaml/unix.cma HOME/.opam/4.10.0/lib/ocaml/bigarray.cma HOME/.opam/4.10.0/lib/integers/integers.cma HOME/.opam/4.10.0/lib/ctypes/ctypes.cma HOME/.opam/4.10.0/lib/ocaml/str.cma HOME/.opam/4.10.0/lib/ctypes/cstubs.cma HOME/.opam/4.10.0/lib/eigen/cpp/eigen_cpp_stubs.cma HOME/.opam/4.10.0/lib/eigen/eigen.cma HOME/.opam/4.10.0/lib/stdlib-shims/stdlib_shims.cma HOME/.opam/4.10.0/lib/owl-base/owl_base.cma HOME/.opam/4.10.0/lib/zip/zip.cma HOME/.opam/4.10.0/lib/npy/npy.cma HOME/.opam/4.10.0/lib/owl/owl.cma src/base/owl_ode_base.cma src/ode/owl_ode.cma HOME/.opam/4.10.0/lib/odepack/fortran/odepack_fortran.cma HOME/.opam/4.10.0/lib/odepack/odepack.cma src/odepack/owl_ode_odepack.cma HOME/.opam/4.10.0/lib/plplot/plplot.cma HOME/.opam/4.10.0/lib/owl-plplot/owl_plplot.cma examples/.van_der_pol_odepack.eobjs/byte/dune__exe__Van_der_pol_odepack.cmo)
ld: library not found for -lplplot_stubs
clang: error: linker command failed with exit code 1 (use -v to see invocation)
File "_none_", line 1:
Error: Error while building custom runtime system

Is that what you are observing?

mseri commented 4 years ago

Not exactly. I do the following:

$ dune build # in 4.09 it works but fails If later on I do the explicit exec call
$ dune build # in 4.10 fails as below
      ocamlc examples/van_der_pol_odepack.bc (exit 2)
(cd _build/default && HOME/.opam/4.10.0/bin/ocamlc.opt -w @1..3@5..28@30..39@43@46..47@49..57@61..62-40 -strict-sequence -strict-formats -short-paths -keep-locs -g -o examples/van_der_pol_odepack.bc HOME/.opam/4.10.0/lib/ocaml/unix.cma HOME/.opam/4.10.0/lib/ocaml/bigarray.cma HOME/.opam/4.10.0/lib/integers/integers.cma HOME/.opam/4.10.0/lib/ctypes/ctypes.cma HOME/.opam/4.10.0/lib/ocaml/str.cma HOME/.opam/4.10.0/lib/ctypes/cstubs.cma HOME/.opam/4.10.0/lib/eigen/cpp/eigen_cpp_stubs.cma HOME/.opam/4.10.0/lib/eigen/eigen.cma HOME/.opam/4.10.0/lib/stdlib-shims/stdlib_shims.cma HOME/.opam/4.10.0/lib/owl-base/owl_base.cma HOME/.opam/4.10.0/lib/zip/zip.cma HOME/.opam/4.10.0/lib/npy/npy.cma HOME/.opam/4.10.0/lib/owl/owl.cma src/base/owl_ode_base.cma src/ode/owl_ode.cma HOME/.opam/4.10.0/lib/odepack/fortran/odepack_fortran.cma HOME/.opam/4.10.0/lib/odepack/odepack.cma src/odepack/owl_ode_odepack.cma HOME/.opam/4.10.0/lib/plplot/plplot.cma HOME/.opam/4.10.0/lib/owl-plplot/owl_plplot.cma examples/.van_der_pol_odepack.eobjs/byte/dune__exe__Van_der_pol_odepack.cmo)
ld: library not found for -lplplot_stubs
clang: error: linker command failed with exit code 1 (use -v to see invocation)
File "_none_", line 1:
Error: Error while building custom runtime system

I’ll try to do what you wrote above as soon as I can. And maybe try to have an example without owl

ghost commented 4 years ago

Ok, that looks like a regression in the compiler. But just to check it's not in dune, could you diff the log file (_build/log) between 4.09 and 4.10 to see if dune is doing something different?

mseri commented 4 years ago

I have pinned all the packages to an identical intermediate version and tried again, only having a 4.09.0 and a 4.10.0 switch available at a time. Now both compilers version seem to behave the same, as one would hope.

I think the behaviour is changing from owl_ode v0.2.0 to master, and someting got messed up in the many repro attempts. The main difference affecting the example executables between the two is

-(executable
+(test
+ (package owl-ode-odepack)
  (name van_der_pol_odepack)
  (libraries owl owl-ode owl-ode-odepack owl-plplot)
- (modules van_der_pol_odepack))
-
-(alias
- (name runtest)
+ (modules van_der_pol_odepack)
  (action
-  (run ./van_der_pol_odepack.exe))
- (package owl-ode-odepack))
+  (run %{test})))

Is it possible that the dune build behaviour between test stanzas and executable + runtest alias is different? It looks like the version using executable was only building the native versions and only when calling dune runtest, while the test one builds both of them but only calls the native one during the dune runtest call. In fact, if I add (mode native) to that executable in the dune file, the issue disappears.

The reason of the linking failure is still a mystery for me but at least it does not seem like a regression

ghost commented 4 years ago

That's odd indeed. Do you have a file dllplplot_stubs.so in your stublibs directory?

Is it possible that the dune build behaviour between test stanzas and executable + runtest alias is different? It looks like the version using executable was only building the native versions and only when calling dune runtest, while the test one builds both of them but only calls the native one during the dune runtest call. In fact, if I add (mode native) to that executable in the dune file, the issue disappears.

The test one building both versions is odd, especially if it doesn't run them both. Are you sure it's building the .bc file? Because we indeed build the .ml file in bytecode just for merlin, but we shouldn't link the .bc if we don't need it. I just checked on a small example and effectively the .ml files are built in both modes by only the .exe file is created.

BTW, note that with lang dune 1.x, the default modes field for executables is (modes byte exe) and with 2.x it is (mode exe). If you migrated to lang dune 2.x, that could explain things.

mseri commented 4 years ago

That's odd indeed. Do you have a file dllplplot_stubs.so in your stublibs directory?

Yes, the library is there and other bytecode executables (also in that folder) that link it are working fine

mseri commented 4 years ago

BTW, note that with lang dune 1.x, the default modes field for executables is (modes byte exe) and with 2.x it is (mode exe). If you migrated to lang dune 2.x, that could explain things.

We migrated to dune 2 last December but we never observed this issue before recent, so I don’t know what to say. In fact, Travis, testing on linux, is still green https://travis-ci.org/owlbarn/owl_ode/jobs/658131914

My repros are on Mac. Maybe it has something to do with it

hongchangwu commented 4 years ago

We are seeing a very similar issue when building utop in custom linking mode with newer versions of dune.

To reproduce:

$ git clone https://github.com/hongchangwu/utop.git -b custom && cd utop && dune build
...
/usr/bin/ld: cannot find -llambda_term_stubs
/usr/bin/ld: cannot find -llwt_unix_stubs
collect2: error: ld returned 1 exit status
File "_none_", line 1:

I have tried different versions of dune from opam and this issue first happens with version 2.1.1. I have also compared the build command issued by dune with an older version (2.0.0) which doesn't have this issue: https://gist.github.com/hongchangwu/1173e5caf3f65ac59e693fa2cfdb5971

As can be seen the build command from 2.6.2 doesn't have any -I arguments, so this does look like a regression in dune.

ghost commented 4 years ago

@hongchangwu you shouldn't add -custom to the flag list manually. Dune needs to know that the user is requesting custom linking, however it doesn't know how to parse OCaml command line flags. So if you pass -custom manually, the assumptions Dune are making about how the compiler behaves will be wrong.

instead, you should modify the modes field and add byte_complete. Note also that -custom has been deprecated in recent versions of OCaml, which is another reason not to pass -custom manually. It has been replaced by another flag with slightly different (and better) behaviour.

ghost commented 4 years ago

BTW, I'm closing this issue as so far it doesn't look like a Dune bug.

hongchangwu commented 4 years ago

@jeremiedimino byte_complete worked for me. So the old versions of dune only worked by accident with -custom I guess?

ghost commented 4 years ago

Indeed. We were making the wrong assumptions, but passing many useless flags. The two put together allowed it to work-ish. I say ish because incremental compilation was probably broken in some cases.

vthemelis commented 1 year ago

byte_complete also solves my building issue but it seems like the .bc.exe file that is produced is not valid input to ocamldebug. Is that expected?

Does this mean that it's impossible to run libraries with C-dependencies under the debugger?

nojb commented 1 year ago

the .bc.exe file that is produced is not valid input to ocamldebug. Is that expected?

Yes, that's right; byte_complete executable cannot be passed to ocamldebug, nor do they include debugging information (ie there are no exception backtraces). This is an issue that needs to be fixed at the level of the compiler.