Closed tjhance closed 9 years ago
I decided I had a some extra sanity today so I could spare some in debugging this...
I think this is a problem with the fact that we lazily load the stdlib llvm::Module, and my theory is that if its still somewhat lazily loaded in the end it will do a double-free or some such badness. I haven't found a smoking gun, but it explains why you need the -r
and -q
flags: if you don't include them, you will essentially defeat the lazy loading and force most? of it into memory. For example, you can take out the -q
flag and the problem disappears, but if you put a return statement at the beginning of dumpPrettyIR(), and the problem will reappear. But then if you put the return statement after the first line (the one including CloneModule()
), the problem reappears. We can also force the loading of the module, which makes the problem go away, by doing m->materializeAll()
in loadStdlib(). It's all circumstantial, and it's possible that llvm::Module* is just as much of a victim as the Stats class is, but that's what my hypothesis is for now (memory management bug in llvm at cleanup).
Ok, I think I found a double-free coming from LLVM and am investigating. We're seven thousand commits behind llvm trunk, so perhaps it's even solved in a later commit but we can't search through all of those.
I think the undefined behavior in this case is the order of the static initializers, which I assume determines the order of the static destructors. So my guess is that in some cases the LLVMContext will get constructed towards the end, and its double-free won't hurt anyone, and in some cases it will get constructed towards the beginning and then someone else will run into the corrupted memory.
So I'm not having much luck debugging the double-free -- I suspect it might be benign and happening all the time. I didn't notice that you put a object definition in the .h file -- I'm surprised that worked at all, though I can't explain why that would work yet still cause the segfault. Anyway, I'm going to move on now, but if this comes back up again I know way more about tracing malloc issues :P
I'm tempted to say that this is from putting an object definition (not just an "extern" declaration) in a header file... what do you think?
I got this weird bug while developing the
__future__
imports.A commit demonstrating the issue is at https://github.com/tjhance/pyston/tree/weird_segfault
When I run
make dbg_map ARGS=-csrq
, I get the following backtrace fromgdb
:Looks like it's segfaulting when it tries to destruct the local static variable
static std::vector<long> counts;
from
core/stats.cpp
I valgrind'ed and gdb'ed but couldn't figure this out - it doesn't seem like the destructor is being called more than once. My best guess is that we're running into some undefined behavior here, but I can't figure out where it is coming from or how it's manifesting.
For my future diff, I resolved the issue by moving around some totally unrelated stuff (namely I moved the definitions of
FutureOption
andfuture_options
fromcodegen/irgen/future.h
tocodegen/irgen/future.cpp
. Notably,core/stats.cpp
does not include (not even indirectly)future.h
!) So this is more evidence that there is some undefined behavior going on.But I have no idea where it's coming from. The use of of the variable
counts
, while hacky, looks fine as far as I can tell. I'm just throwing this issue up here in case anybody has any idea what's going on, because I'd like to know, and it might be an issue that shows up again later.