pyston / pyston_v1

The previous version of Pyston, a faster implementation of the Python programming language. Please use this link for the new repository:
https://github.com/pyston/pyston/
4.89k stars 289 forks source link

segfault #133

Closed tjhance closed 9 years ago

tjhance commented 10 years ago

I got this weird bug while developing the __future__ imports.

A commit demonstrating the issue is at https://github.com/tjhance/pyston/tree/weird_segfault

When I run make dbg_map ARGS=-csrq, I get the following backtrace from gdb:

Program received signal SIGSEGV, Segmentation fault.
0x00002aaaac3b5e08 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#0  0x00002aaaac3b5e08 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00002aaaac3b6b89 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00000000007b4a60 in __gnu_cxx::new_allocator<long>::deallocate (this=0x1c5aba0 <pyston::Stats::getStatId(std::string const&)::counts>, __p=0x1d83de0) at /home/tjhance/pyston_deps/gcc-4.8.2-install/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/../../../../include/c++/4.8.2/ext/new_allocator.h:110
#3  0x00000000007b4a2e in std::_Vector_base<long, std::allocator<long> >::_M_deallocate (this=0x1c5aba0 <pyston::Stats::getStatId(std::string const&)::counts>, __p=0x1d83de0, __n=64) at /home/tjhance/pyston_deps/gcc-4.8.2-install/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/../../../../include/c++/4.8.2/bits/stl_vector.h:174
#4  0x00000000007b6259 in std::_Vector_base<long, std::allocator<long> >::~_Vector_base (this=0x1c5aba0 <pyston::Stats::getStatId(std::string const&)::counts>) at /home/tjhance/pyston_deps/gcc-4.8.2-install/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/../../../../include/c++/4.8.2/bits/stl_vector.h:160
#5  0x00000000007b15c8 in std::vector<long, std::allocator<long> >::~vector (this=0x1c5aba0 <pyston::Stats::getStatId(std::string const&)::counts>) at /home/tjhance/pyston_deps/gcc-4.8.2-install/lib/gcc/x86_64-unknown-linux-gnu/4.8.2/../../../../include/c++/4.8.2/bits/stl_vector.h:416
#6  0x00002aaaac372901 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x00002aaaac372985 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x00002aaaac358774 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#9  0x00000000006535e9 in _start ()

Looks like it's segfaulting when it tries to destruct the local static variable

static std::vector<long> counts;

from core/stats.cpp

I valgrind'ed and gdb'ed but couldn't figure this out - it doesn't seem like the destructor is being called more than once. My best guess is that we're running into some undefined behavior here, but I can't figure out where it is coming from or how it's manifesting.

For my future diff, I resolved the issue by moving around some totally unrelated stuff (namely I moved the definitions of FutureOption and future_options from codegen/irgen/future.h to codegen/irgen/future.cpp. Notably, core/stats.cpp does not include (not even indirectly) future.h!) So this is more evidence that there is some undefined behavior going on.

But I have no idea where it's coming from. The use of of the variable counts, while hacky, looks fine as far as I can tell. I'm just throwing this issue up here in case anybody has any idea what's going on, because I'd like to know, and it might be an issue that shows up again later.

kmod commented 10 years ago

I decided I had a some extra sanity today so I could spare some in debugging this...

I think this is a problem with the fact that we lazily load the stdlib llvm::Module, and my theory is that if its still somewhat lazily loaded in the end it will do a double-free or some such badness. I haven't found a smoking gun, but it explains why you need the -r and -q flags: if you don't include them, you will essentially defeat the lazy loading and force most? of it into memory. For example, you can take out the -q flag and the problem disappears, but if you put a return statement at the beginning of dumpPrettyIR(), and the problem will reappear. But then if you put the return statement after the first line (the one including CloneModule()), the problem reappears. We can also force the loading of the module, which makes the problem go away, by doing m->materializeAll() in loadStdlib(). It's all circumstantial, and it's possible that llvm::Module* is just as much of a victim as the Stats class is, but that's what my hypothesis is for now (memory management bug in llvm at cleanup).

kmod commented 10 years ago

Ok, I think I found a double-free coming from LLVM and am investigating. We're seven thousand commits behind llvm trunk, so perhaps it's even solved in a later commit but we can't search through all of those.

I think the undefined behavior in this case is the order of the static initializers, which I assume determines the order of the static destructors. So my guess is that in some cases the LLVMContext will get constructed towards the end, and its double-free won't hurt anyone, and in some cases it will get constructed towards the beginning and then someone else will run into the corrupted memory.

kmod commented 10 years ago

So I'm not having much luck debugging the double-free -- I suspect it might be benign and happening all the time. I didn't notice that you put a object definition in the .h file -- I'm surprised that worked at all, though I can't explain why that would work yet still cause the segfault. Anyway, I'm going to move on now, but if this comes back up again I know way more about tracing malloc issues :P

kmod commented 10 years ago

I'm tempted to say that this is from putting an object definition (not just an "extern" declaration) in a header file... what do you think?