uwplse / herbgrind

A Valgrind tool for Herbie
GNU General Public License v3.0
90 stars 7 forks source link

Segfault on NAS benchmark MultiGrid (MG) #56

Open hbrunie opened 4 years ago

hbrunie commented 4 years ago

Hello,

I tried to run Herbgrind on MG from NAS benchmarks (C++ serial version: NAS-bench) and I got a segfault:

$HOME/utils/tools/herbgrind/valgrind/herbgrind-install/bin/valgrind --tool=herbgrind ../bin/mg.S 
==28521== Herbgrind, a valgrind tool for Herbie
==28521== Copyright (C) 2016-2017, and GNU GPL'd, by Alex Sanchez-Stern
==28521== Using Valgrind-3.15.0.GIT and LibVEX; rerun with -h for copyright info
==28521== Command: ../bin/mg.S
==28521== 
NAS Parallel Benchmarks 4.0 OpenMP C++ version - MG Benchmark
Developed by: Dalvan Griebler <dalvan.griebler@acad.pucrs.br> & Júnior Löff <loffjh@gmail.com>

 No input file. Using compiled defaults
 Size:  32x 32x 32 (class_npb S)
 Iterations:   4
==28521== 
==28521== Process terminating with default action of signal 11 (SIGSEGV)
==28521==  Access not within mapped region at address 0x1207CB43F8
==28521==    at 0x404CB1: resid(double***, double***, double***, int, int, int, double*, int) (in /global/u1/h/hbrunie/benchmarks/NPB-CPP/NPB-SER/bin/mg.S)
==28521==    by 0x400EEA: main (in /global/u1/h/hbrunie/benchmarks/NPB-CPP/NPB-SER/bin/mg.S)
==28521==  If you believe this happened as a result of a stack
==28521==  overflow in your program's main thread (unlikely but
==28521==  possible), you can try to increase the size of the
==28521==  main thread stack using the --main-stacksize= flag.
==28521==  The main thread stack size used in this run was 16777216.
==28521== 
Didn't find any marks!
Segmentation fault

Could you help me debug this?

Thanks, Hugo Brunie

HazardousPeach commented 3 years ago

Hey Hugo, thanks for getting in touch!

It looks like the segfault is happening in client code, not in Herbgrind code, so if the program doesn't segfault when run uninstrumented, then Herbgrind is somehow interfering with the client state (which it's really not supposed to do).

My best bet to debugging something like this is to first test it under the nullgrind Valgrind tool, which does all the normal Valgrind decompiling and JITing, but doesn't actually do any instrumentation. If that still segfaults, then the problem is in Valgrind code. If not, then the next step would be to go into the Herbgrind source, and start removing sections of the instrumentation, to see if you can narrow down where the segfault is being caused. That work can be a little harrowing, so stay in touch on the issue, and I'll try to help out when I have free cycles.

Cheers, Alex

hbrunie commented 3 years ago

So I don't know how to build nullgrind, but I tested memcheck from the same valgrind install, and it worked well. I guess memcheck tests at least as most stuff as nullgrind, so we can go to 2nd step of the debug.

Maybe I can use gdb on herbgrind?

hbrunie commented 3 years ago

GDB does not helped much. I note that herbgrind is compiled with -g, as well as NAS MG.

Starting program: /global/u1/h/hbrunie/utils/tools/herbgrind/valgrind/herbgrind-install/bin/valgrind --tool=herbgrind ./bin/mg.S Missing separate debuginfos, use: zypper install glibc-debuginfo-2.26-13.45.1.x86_64 process 38724 is executing new program: /global/u1/h/hbrunie/utils/tools/herbgrind/valgrind/herbgrind-install/lib/valgrind/herbgrind-amd64-linux ==38724== Herbgrind, a valgrind tool for Herbie ==38724== Copyright (C) 2016-2017, and GNU GPL'd, by Alex Sanchez-Stern ==38724== Using Valgrind-3.15.0.GIT and LibVEX; rerun with -h for copyright info ==38724== Command: ./bin/mg.S ==38724==

Program received signal SIGSEGV, Segmentation fault. 0x0000001002c06f6c in ?? () (gdb) bt

0 0x0000001002c06f6c in ?? ()

1 0x0000001002a8df30 in ?? ()

2 0x0000000000013347 in ?? ()

3 0x000000100200d210 in ?? ()

4 0x0000000000000000 in ?? ()

HazardousPeach commented 3 years ago

Yeah unfortunately gdb doesn't play well with Valgrind, it seems to have issues emulating the crazy stuff Valgrind does, and segfaults even when Herbgrind/Valgrind wouldn't otherwise. I did some digging myself, and it looks like you can remove most of the Herbgrind code and just leave the creation of Shadow Temporaries, and it'll still crash. My hunch is that something is weird with the allocater that is breaking when there are so many floating point ops in the program that need to allocate shadow values.