uwplse / herbgrind

A Valgrind tool for Herbie
GNU General Public License v3.0
90 stars 7 forks source link

Not able to run spec gromacs #32

Closed sangeeta0201 closed 4 years ago

sangeeta0201 commented 6 years ago

Hi Alex,

I am trying to run herbgrind with spec bechmark gromacs and I am getting seg fault. Let me know if you need any other info to reproduce this issue. I have used following command to create a gromacs binary and then manually run it using herbgrind

runspec --config=Sangeeta-linux.cfg --size=ref --noreportable --tune=base --iterations=1 gromacs
herbgrind.sh ./gromacs_base.gcc43-64bit -deffnm gromacs -nice 0> gromacs.ref.out 2 >gromacs.ref.err
==29987== Herbgrind, a valgrind tool for Herbie
==29987== Copyright (C) 2016-2017, and GNU GPL'd, by Alex Sanchez-Stern
==29987== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==29987== Command: ./gromacs_base.gcc43-64bit -deffnm gromacs -nice 2
==29987== 
==29987== 
==29987== Process terminating with default action of signal 11 (SIGSEGV)
==29987==  Access not within mapped region at address 0x3FF00000
==29987==    at 0x4033BB: flincs_ (in benchspec/CPU2006/435.gromacs/run/run_base_ref_gcc43-64bit.0000/gromacs_base.gcc43-64bit)
==29987==    by 0x43D01B: constrain_lincs.isra.3 (constr.c:333)
==29987==    by 0x43E6A4: constrain (constr.c:574)
==29987==    by 0x4B251B: update (update.c:759)
==29987==    by 0x49AC9C: do_shakefirst (sim_util.c:420)
==29987==    by 0x46AB9A: do_md (md.c:331)
==29987==    by 0x46B672: mdrunner (md.c:179)
==29987==    by 0x4024BF: main (mdrun.c:235)
==29987==  If you believe this happened as a result of a stack
==29987==  overflow in your program's main thread (unlikely but
==29987==  possible), you can try to increase the size of the
==29987==  main thread stack using the --main-stacksize= flag.
==29987==  The main thread stack size used in this run was 8388608.
==29987== 
Didn't find any marks!
herbgrind/herbgrind.sh: line 17: 29987 Segmentation fault      (core dumped) $HG "$@"
HazardousPeach commented 6 years ago

Hey @sangeeta0201 . I'm looking into this right now.

Is there a reason you're running Herbgrind on the "ref" dataset of Gromacs? This dataset appears to take a few minutes uninstrumented which is outside of the scope of which we have developed Herbgrind. During our tests, we ran on the "test" dataset, which is significantly more managable, and we think gives sufficient insight into the numerical issues of Gromacs.

sangeeta0201 commented 6 years ago

I see. I will run with the test data set and see if I encounter any issues. Thanks for quick response.

jtarango commented 4 years ago

For valgrind you need to increase the following variables: herbgrind/valgrind/herbgrind-install/bin/valgrind --tool=herbgrind --main-stacksize=10485760 --max-stackframe=10485760 --valgrind-stacksize=10485760 --num-callers=500 yourProgram

pavpanchekha commented 4 years ago

@jtarango does gromacs run to completion with those flags?

jtarango commented 4 years ago

I have not checked for that application; I suspect the same problem as I was encountering. To execute large applications with data files the user has to increase the "ulimit -s" or program size size of execution. For such programs, you will need to increase swap space if your do not have enough runtime memory. For my virtual machine, I have 2 TB NVMe swap and 128 GB RAM with the upper limits changed based on the "largeness" of the application I am running.

Determine and Increase Limits

You can modify this here: /etc/security/limits.conf Operation sh/ksh/bash command csh/tcsh command
View soft limits ulimit -S -a limit
View hard limits ulimit -H -a limit -h
Set stack size to 128 MB ulimit -S -s 131072 limit stacksize 128m

Reference: https://manpages.ubuntu.com/manpages/bionic/man5/limits.conf.5.html

In my case running the command I have: [jtarango@dymensions] $ ulimit -H -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 128323 max locked memory (kbytes, -l) 16384 max memory size (kbytes, -m) unlimited open files (-n) 1048576 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 128323 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited

[jtarango@vm] $ ulimit -S -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 128323 max locked memory (kbytes, -l) 16384 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 128323 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited

Compiling Code Flags

When compiling with GCC, I compile my program to ensure herbgrind can actively track the execution. The -fsplit-stack allows for large stack programs to split the stack runtime. For example: -g -Wall -Werror -ansi -O0 -fsplit-stack -fstack-protector -m64 -v

Script Example

I am using are the following in an .sh script to automate: runEvaluation.sh

!/usr/bin/env bash

Exit on error

set -e

Determine physical directory of this script

src="${BASH_SOURCE[0]}" while [ -L "$src" ]; do dir="$(cd -P "$(dirname "$src")" && pwd)" src="$(readlink "$src")" [[ $src != /* ]] && src="$dir/$src" done

Herbgrind program setup

MYDIR="$(cd -P "$(dirname "$src")" && pwd)" VALPGM="$MYDIR/valgrind/herbgrind-install/bin/valgrind" VALPARAM="--tool=herbgrind --error-threshold=1 --precision=1000 --main-stacksize=99999999 --max-stackframe=99999999 --valgrind-stacksize=10485760 --num-callers=500 --verbose" HERBGRINDPARAM="--detailed-ranges --follow-real-exeuction --print-object-files --expr-colors --ignore-pure-zeroes --no-sound-simplify --shortmark-all-exprs"

Program Details

PDIR="/home/jtarango" PGMPARAM="$PDIR/data $PDIR/query" TODAY='date +%Y-%m-%d.%H:%M:%S'

Instance name, execution details, and logs

PA="dp" PGMA="$PDIR/$PA" HGA="$VALPGM $VALPARAM $PGMA $PGMPARAM" REDIRECTA=2>&1 | tee $PDIR/$PA-Log-$HOSTNAME-$TODAY.log $HGA "$@" $HERBGRINDPARAM $REDIRECTA

pavpanchekha commented 4 years ago

Thanks for the advice. I'm closing the issue unless someone reports that the bug is still present.

jtarango commented 4 years ago

Just for a point of reference. When I am tracking time series for herbgrind, I have switched some computation to float vs double then rand the computation on terabytes of data. For a data point today, I peaked my virtual machine DRAM at 64 GB and ~68 GB of swap space... Valgrind takes quite a bit of memory when tracking the long computation chains. To enable this temporarily, you will have to do the items above with ulimit -s unlimited to set an unlimited stack.

To enable valgrind with unlimited resources, I had to enable an unlimited stack. [jtarango@] $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 257347 max locked memory (kbytes, -l) 16384 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) unlimited virtual memory (kbytes, -v) unlimited file locks (-x) unlimited

If anyone is doing Machine Learning and Artificial intelligence big data computation, I recommend to switch to an Intel Purley platform with with the maxima memory configuration I.E. 12TB of Optane DIMMs may be installed in a quad-socket system (3TB per CPU + 1.5TB of DRAM). Then to get above the runtime memory limit, I recommend to have a large ZFS pool with a swap as large as the application needs. For applications such as genetics, you will need 1 petabyte of swap space to be able to do large sets of similarity search and deep neural network training.