ocaml-community / utop

Universal toplevel for OCaml
Other
846 stars 113 forks source link

Segfaults on startup, ocaml 4.02.0 and ubuntu14, async installed #105

Closed struktured closed 9 years ago

struktured commented 10 years ago

───────────────────────────────────────────────────────────────┬────────────────────────────────────────────────────────────┬─────────────────────────────────────────────────────────────── │ Welcome to utop version 1.15 (using OCaml version 4.02.0)! │ └────────────────────────────────────────────────────────────┘
* Error in `/home/carm/.opam/4.02.0/bin/ocamlrun': double free or corruption (out): 0x00000000008de980 *

Type #utop_help for help about using utop.

* Error in `/home/carm/.opam/4.02.0/bin/ocamlrun': double free or corruption (out): 0x00000000008de980 * Lost connection with process 4733 (active process) between time 560000 and time 570000 Aborted (core dumped) Restart from time 560000 and try to get closer of the problem ? (y or n)

It hangs from here. Let me know what else I should try to get more useful debugging. It's a 64 bit kubuntu 14 system with ocaml 4.0.2, async, ppx, core, ppx_protobufs, oasis, bitstring, omake, and some other less notable packages. Seen same behavior on vanillla 64 bit ubuntu as well.

What's weird is that it works when I first install it, but eventually ends in the aforementioned state. Not sure if it's a particular opam package or a corrupt config file somewhere that I haven't deleted. (In the past I noticed that killing utop at the wrong moment does sometimes leave it a corrupted state).

Thanks

whitequark commented 10 years ago

I believe this is a problem with Core/Async. You could get a notification earlier by running it under valgrind, e.g. as valgrind --log-file=utop.log utop and repeating everything you do; when valgrind reports corruption, you've found the issue.

This may be related to some changes in 4.02 data representation; I think Core does some Obj.magic and the latest version of Core is required for full 4.02 compatibility. What version are you using?

struktured commented 10 years ago

Using valgrind, now it crashes after I start typing. Using core straight from opam, no pinning (version is 112.01.00). The utop.log seems to suggest its in the middle of a lwt_unix execution path. Do you suggest I opam pin the latest core or lwt to resolve this?

Std out/error: utop # Fatal error: exception Invalid_argument("Zed_utf8.singleton") carm@mandelbrot:~$

utop.log file: 8682== Memcheck, a memory error detector ==8682== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==8682== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info ==8682== Command: /home/carm/.opam/4.02.0/bin/utop ==8682== Parent PID: 7003 ==8682== ==8682== Invalid write of size 8 ==8682== at 0x4C2F793: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==8682== by 0x6D73661: lwt_unix_blit_string_bytes (in /home/carm/.opam/4.02.0/lib/stublibs/dlllwt-unix_stubs.so) ==8682== by 0x41C491: caml_interprete (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41DCDF: caml_main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41AE29: main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== Address 0x6348130 is 0 bytes after a block of size 0 alloc'd ==8682== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==8682== by 0x6B63FA2: caml_ba_alloc (in /usr/local/lib/ocaml/stublibs/dllbigarray.so) ==8682== by 0x6B6413D: caml_ba_create (in /usr/local/lib/ocaml/stublibs/dllbigarray.so) ==8682== by 0x41C3CF: caml_interprete (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41DCDF: caml_main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41AE29: main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== ==8682== Invalid write of size 2 ==8682== at 0x4C2F7E3: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==8682== by 0x6D73661: lwt_unix_blit_string_bytes (in /home/carm/.opam/4.02.0/lib/stublibs/dlllwt-unix_stubs.so) ==8682== by 0x41C491: caml_interprete (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41DCDF: caml_main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41AE29: main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== Address 0x6348580 is 720 bytes inside a block of size 14,482 free'd ==8682== at 0x4C2BDEC: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==8682== by 0x40B3A9: caml_stat_free (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x413C43: caml_input_val (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x413D00: caml_input_value (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41C31E: caml_interprete (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41DCDF: caml_main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41AE29: main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== ==8682== Invalid write of size 1 ==8682== at 0x4C2F953: memcpy@@GLIBC_2.14 (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==8682== by 0x6D73661: lwt_unix_blit_string_bytes (in /home/carm/.opam/4.02.0/lib/stublibs/dlllwt-unix_stubs.so) ==8682== by 0x41C491: caml_interprete (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41DCDF: caml_main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41AE29: main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== Address 0x6348586 is 726 bytes inside a block of size 14,482 free'd ==8682== at 0x4C2BDEC: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==8682== by 0x40B3A9: caml_stat_free (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x413C43: caml_input_val (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x413D00: caml_input_value (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41C31E: caml_interprete (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41DCDF: caml_main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41AE29: main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== ==8682== Thread 2: ==8682== Syscall param write(buf) points to unaddressable byte(s) ==8682== at 0x557935D: ??? (syscall-template.S:81) ==8682== by 0x6D6F1D0: worker_bytes_write (in /home/carm/.opam/4.02.0/lib/stublibs/dlllwt-unix_stubs.so) ==8682== by 0x6D741BE: execute_job (in /home/carm/.opam/4.02.0/lib/stublibs/dlllwt-unix_stubs.so) ==8682== by 0x6D746EB: worker_loop (in /home/carm/.opam/4.02.0/lib/stublibs/dlllwt-unix_stubs.so) ==8682== by 0x5572181: start_thread (pthread_create.c:312) ==8682== by 0x5882FBC: clone (clone.S:111) ==8682== Address 0x6348130 is 0 bytes after a block of size 0 alloc'd ==8682== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==8682== by 0x6B63FA2: caml_ba_alloc (in /usr/local/lib/ocaml/stublibs/dllbigarray.so) ==8682== by 0x6B6413D: caml_ba_create (in /usr/local/lib/ocaml/stublibs/dllbigarray.so) ==8682== by 0x41C3CF: caml_interprete (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41DCDF: caml_main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41AE29: main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== ==8682== Syscall param read(buf) points to unaddressable byte(s) ==8682== at 0x55793BD: ??? (syscall-template.S:81) ==8682== by 0x6D6EF34: worker_bytes_read (in /home/carm/.opam/4.02.0/lib/stublibs/dlllwt-unix_stubs.so) ==8682== by 0x6D741BE: execute_job (in /home/carm/.opam/4.02.0/lib/stublibs/dlllwt-unix_stubs.so) ==8682== by 0x6D746EB: worker_loop (in /home/carm/.opam/4.02.0/lib/stublibs/dlllwt-unix_stubs.so) ==8682== by 0x5572181: start_thread (pthread_create.c:312) ==8682== by 0x5882FBC: clone (clone.S:111) ==8682== Address 0x63480f0 is 0 bytes after a block of size 0 alloc'd ==8682== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==8682== by 0x6B63FA2: caml_ba_alloc (in /usr/local/lib/ocaml/stublibs/dllbigarray.so) ==8682== by 0x6B6413D: caml_ba_create (in /usr/local/lib/ocaml/stublibs/dllbigarray.so) ==8682== by 0x41C3CF: caml_interprete (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41DCDF: caml_main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41AE29: main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== ==8682== Thread 1: ==8682== Invalid read of size 4 ==8682== at 0x6B641BF: caml_ba_get_N (in /usr/local/lib/ocaml/stublibs/dllbigarray.so) ==8682== by 0x6B642FC: caml_ba_get_1 (in /usr/local/lib/ocaml/stublibs/dllbigarray.so) ==8682== by 0x41C374: caml_interprete (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41DCDF: caml_main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41AE29: main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== Address 0x63480f0 is 0 bytes after a block of size 0 alloc'd ==8682== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==8682== by 0x6B63FA2: caml_ba_alloc (in /usr/local/lib/ocaml/stublibs/dllbigarray.so) ==8682== by 0x6B6413D: caml_ba_create (in /usr/local/lib/ocaml/stublibs/dllbigarray.so) ==8682== by 0x41C3CF: caml_interprete (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41DCDF: caml_main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== by 0x41AE29: main (in /home/carm/.opam/4.02.0/bin/ocamlrun) ==8682== ==8682== ==8682== HEAP SUMMARY: ==8682== in use at exit: 10,697,801 bytes in 105 blocks ==8682== total heap usage: 1,536 allocs, 1,431 frees, 15,185,833 bytes allocated ==8682== ==8682== LEAK SUMMARY: ==8682== definitely lost: 70 bytes in 2 blocks ==8682== indirectly lost: 0 bytes in 0 blocks ==8682== possibly lost: 3,936,968 bytes in 10 blocks ==8682== still reachable: 6,760,763 bytes in 93 blocks ==8682== suppressed: 0 bytes in 0 blocks ==8682== Rerun with --leak-check=full to see details of leaked memory

whitequark commented 10 years ago

Does it crash if you don't load Core or Async? Try to find the minimal set of packages that causes the crash.

struktured commented 10 years ago

Same error in valgrind, I isolated my opam repository to the following:

Installed packages for 4.02.0:

base-threads base Threads library distributed with the OCaml compiler base-unix base Unix library distributed with the OCaml compiler camlp4 4.02.0+2 Camlp4 is a system for writing extensible parsers for programming languages camomile 0.8.5 A comprehensive Unicode library lambda-term 1.6 Terminal manipulation library for OCaml lwt 2.4.5 A cooperative threads library for OCaml ocamlfind 1.5.3 A library manager for OCaml react 1.2.0 Declarative events and signals for OCaml utop 1.15 Universal toplevel for OCaml zed 1.3 Abstract engine for text edition in OCaml

whitequark commented 10 years ago

So it is a bug in, probably, lwt. Very interesting. What platform do you have?

struktured commented 10 years ago

It's a bottom end Asus / Amd PC

carm@mandelbrot:~$ uname -a Linux mandelbrot 3.13.0-37-generic #64-Ubuntu SMP Mon Sep 22 21:28:38 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Chipset is :

AMD A8-5500 APU (4 cores)

I also witnessed this error before on ubuntu 12 x86_64 with Intel i7 chipset. I made a completely new test user to start an opam repo from scratch, just to be sure there's no stale files lying around causing it to crash. Will report results later.

whitequark commented 10 years ago

When you do, please upload an archive of ~/.opam somewhere. I will investigate it locally then.

struktured commented 10 years ago

A test user worked fine. I uploaded both .opam repositories as bzipped tarballs to google drive. If you need an alternative link, let me know. The suffix "working" is the test user, which does not crash, and the suffix "notworking" is my personal user for which it does crash. If there are any other relevant config files outside the .opam folder, please let me know so I can check those out too.

https://drive.google.com/folderview?id=0B8QxjG1MVy-MNUN2cXA1MkJuZnc&usp=sharing

whitequark commented 10 years ago

I... cannot actually reproduce this. I can load utop under valgrind and do basic actions like Lwt_io.printl without valgrind ever complaining. O_o

I've also verified that I'm launching the right utop using the right ocamlrun as well.

struktured commented 10 years ago

Here's maybe a relevant detail? I remember getting utop to sometimes hang after putting in some ppx deriving protobuf expressions. So I would press CTRL-C or CTRL-D and kill the process uncleanly. I believe when I restarted utop is when I started to see it crash.

Is there state on the file system that utop manages which would not be deleted via opam remove utop ? In any event, I'll see if I can find a sequence of steps to reproduce it with my clean test user.

Thanks

whitequark commented 10 years ago

No, utop does not store any state on the filesystem.

rjnn commented 10 years ago

I can confirm that I get the exact same error as OP. I am running a fresh install of Ubuntu 14.04 64-bit LTS, and followed the installation instructions given in https://github.com/realworldocaml/book/wiki/Installation-Instructions and came to the same error.

Here are the steps that fixed it for me:

  1. opam switch 4.01.0
  2. opam pin lwt 2.4.4
  3. rerun the installation packages from the above link (i.e. opam install core utop async yojson core_extended core_bench cohttp async_graphics cryptokit menhir merlin ocp-indent)

Then utop works fine. This seems to be an issue with lwt 2.4.6, which is fixed by reverting to 2.4.4. I suspect that pinning lwt to 2.4.4 would solve the problem even with ocaml 4.02.1, but I haven't tried it and thus cannot confirm that.

struktured commented 10 years ago

Thanks for the insight, @arjunravinarayan , but unfortunately it depends on an older version of camlp4.

$ opam pin add lwt 2.4.4 lwt is now version-pinned to 2.4.4

[lwt.2.4.4] Downloading https://github.com/ocsigen/lwt/archive/2.4.4.tar.gz

lwt needs to be reinstalled. The following dependencies couldn't be met:

No solution found, exiting [NOTE] Pinning command successful, but your installed packages may be out of sync.

rjnn commented 10 years ago

I think you need to do the opam switch 4.01.0 then. That should fix it (for certain suboptimal values of "fix").

ghost commented 10 years ago

I had a quick look at the diff and the only change that could be related is I think this one:

ocsigen/lwt@d77e4b67c67f47d2bd3b680a04604c9552d0f14d

Could someone try to revert it in lwt 2.4.6 and see if it fixes the issue?

struktured commented 9 years ago

I can't reproduce this problem anymore, at least not with 4.02.1.