madscientist / google-coredumper

Google coredumper library
BSD 3-Clause "New" or "Revised" License
2 stars 4 forks source link

Fails compiled with -O2 / GCC 7.x #1

Closed madscientist closed 2 years ago

madscientist commented 6 years ago

When compiling with GCC 7.2 / 7.3 and -O2, the coredumper library doesn't work properly. The unittest dumps core immediately. When using the library in a real program I've seen things such as the coredumper library hanging inside the syscalls.

If you compile with -O1, then everything works. It also works with -O2 and older compilers (GCC 4.x and 5.x for sure... I'm not sure about 6.x). I don't know if this represents a bug in the coredumper code or an issue with the optimizer in GCC (most likely the former).

To reproduce simply check out and build using the normal operations and GCC 7.x:

./preconfig
./configure
make
make check

will dump core. If you clean up then build with -O1, it will work:

git clean -fdX
./preconfig
./configure CFLAGS=-O1
make
make check
madscientist commented 6 years ago

OK, something is very wonky here: in GCC 8.1 even -O1 causes the unit test to fail. Using -O0 allows it to work.

madscientist commented 6 years ago

After much investigation I discovered that compilation at -O1 and -Og can be made to succeed by avoiding GCC's builtin memcpy() in one location. I poked around for an entirely unjustifiable (to me) amount of time but could not figure out why. If I had better assembly-fu maybe I would have better luck.

However, I'm leaving this open because compiling with -O2 still causes the unittests to crash. I tried for a much shorter amount of time to figure this out, but the closest I could get was that when the test invokes the waitpid() system call (via sys_waitpid) not the libc wrapper) it appears to crash before it returns.

Maybe someday I'll get back to this. Or, maybe someone much more knowledgeable than me about these issues will help figure it out.

madscientist commented 2 years ago

I believe that this was caused by the stack alignment issue fixed by #3