nest / nest-simulator

The NEST simulator
http://www.nest-simulator.org
GNU General Public License v2.0
535 stars 361 forks source link

The invalid memory is freed. #132

Closed ikitayama closed 8 years ago

ikitayama commented 8 years ago

While I am executing a test program on K computer from

testsuite/manualtests/ticket-458

I get:

jwe1050i-w The hardware barrier couldn't be used and continues processing using the software barrier.
taken to (standard) corrective action, execution continuing.
jwe1603i-w The invalid memory is freed.
(Address:0  Free(function:std::basic_ifstream<char, std::char_traits<char>>::~basic_ifstream()  line:0))
 error occurs at _ZNSt14basic_ifstreamIcSt11char_traitsIcEED1Ev loc 0000000000ae1610 offset 0000000000000090 
 _ZNSt14basic_ifstreamIcSt11char_traitsIcEED1Ev     at loc 0000000000ae1580 called from loc 0000000000d6c944 in _ZNK10SLIStartup9checkpathERKSsRSs      
 _ZNK10SLIStartup9checkpathERKSsRSs     at loc 0000000000d6c340 called from loc 0000000000d718fc in _ZN10SLIStartup4initEP14SLIInterpreter      
 _ZN10SLIStartup4initEP14SLIInterpreter     at loc 0000000000d70a00 called from loc 0000000000d58df4 in _ZN9SLIModule7installERSoP14SLIInterpreter      
 _ZN9SLIModule7installERSoP14SLIInterpreter     at loc 0000000000d58d80 called from loc 0000000000c1b40c in _ZN14SLIInterpreter9addmoduleEP9SLIModule      
 _ZN14SLIInterpreter9addmoduleEP9SLIModule     at loc 0000000000c1b3c0 called from loc 000000000011da38 in _Z11neststartupiPPcR14SLIInterpreterRPN4nest7NetworkE      
 _Z11neststartupiPPcR14SLIInterpreterRPN4nest7NetworkE     at loc 000000000011d880 called from loc 0000000000111ea0 in main          
 main         at loc 0000000000111e80 called from o.s.  
taken to (standard) corrective action, execution continuing.
--------------------------------------------------------------------------
[mpi::mpi-api::mpi-abort]
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode 126.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[i42-036:18488] /opt/FJSVtclang/GM-1.2.0-18/lib64/libmpi.so.0(orte_errmgr_base_error_abort+0x84) [0xffffffff008df684]
[i42-036:18488] /opt/FJSVtclang/GM-1.2.0-18/lib64/libmpi.so.0(ompi_mpi_abort+0x51c) [0xffffffff0068389c]
[i42-036:18488] /opt/FJSVtclang/GM-1.2.0-18/lib64/libmpi.so.0(MPI_Abort+0x6c) [0xffffffff0069b3ac]
[i42-036:18488] /opt/FJSVtclang/GM-1.2.0-18/lib64/libtrtmet_c.so.1(MPI_Abort+0x2c) [0xffffffff00159bf0]
[i42-036:18488] ./nest [0x992cac]
[i42-036:18488] ./nest [0x11dd04]
[i42-036:18488] ./nest(main+0x38) [0x111eb8]
[i42-036:18488] /lib64/libc.so.6(__libc_start_main+0x194) [0xffffffff0323381c]
[i42-036:18488] ./nest [0x111d2c]
[ERR.] PLE 0019 plexec One of MPI processes was aborted.(rank=0)(nid=0x210a0034)(CODE=1938,793745140674134016,32256)

Below is my submission script

#!/bin/sh

#PJM -S
#PJM --rsc-list "elapse=10:00"
#PJM --rsc-list "rscgrp=micro"
#PJM --rsc-list "node=12"
#PJM --mpi "assign-online-node"
. /home/system/Env_base

export PARALLEL=1
export OMP_RUN_THREADS=1
export FLIB_FASTOMP=false

mpiexec -np 1 ./nest conf.cli run_benchmark_458.sli

Wonder if other people seen similar error on other supercomputers?

ikitayama commented 8 years ago

@JanneM pointed out that I was using incorrect .sli files.

ikitayama commented 8 years ago

Reopening although this is platform specific, I am still seeing this.

jougs commented 8 years ago

@janhahne pointed out that this is caused by the same problem as #132, which is closed. Thus I am also closing this. @ikitayama, feel free to re-open if you find the issue persists.