Closed shmilee closed 1 year ago
solved:
ulimit -s unlimited
before mpirun -np 32 ./app
-heap-arrays n
option. refDebug info from core file:
Core was generated by `./gtc'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000000000663925 in lap2petsc () at poisson.F90:908
908 nindex0=nindexlap
(gdb) bt
#0 0x0000000000663925 in lap2petsc () at poisson.F90:908
Backtrace stopped: Cannot access memory at address 0x7ffe1af21138
(gdb) info locals
nindexlap = <error reading variable nindexlap (value requires 547524 bytes, which is more than max-value-size)>
BTY Bus error
info about MPIDI_POSIX_eager_init
:
Core was generated by `./flc/test_flc'.
Program terminated with signal 7, Bus error.
#0 MPIDI_POSIX_eager_init (global_rank=1, num_global=177087) at ../../src/mpid/ch4/shm/posix/eager/include/intel_transport_init.h:2939
2939 ../../src/mpid/ch4/shm/posix/eager/include/intel_transport_init.h: No such file or directory.
Missing separate debuginfos, use: debuginfo-install glibc-2.17-317.el7.x86_64 libgcc-4.8.5-44.el7.x86_64 numactl-devel-2.0.12-5.el7.x86_64
(gdb) bt
#0 MPIDI_POSIX_eager_init (global_rank=1, num_global=177087) at ../../src/mpid/ch4/shm/posix/eager/include/intel_transport_init.h:2939
#1 0x00007f789007e520 in MPIDI_POSIX_eager_init (rank=<optimized out>, size=<optimized out>) at ../../src/mpid/ch4/shm/posix/eager/include/posix_eager_impl.h:25
#2 MPIDI_POSIX_mpi_init_hook (rank=1, size=177087, n_vcis_provided=0x0, tag_bits=0x2514e) at ../../src/mpid/ch4/shm/posix/posix_init.c:133
#3 0x00007f789018a676 in MPIDI_SHMI_mpi_init_hook (rank=1, size=177087, n_vcis_provided=0x0, tag_bits=0x2514e) at ../../src/mpid/ch4/shm/src/shm_init.c:28
#4 0x00007f788fb8a93a in MPID_Init (argc=0x1, argv=0x2b3bf, requested=0, provided=0x2514e) at ../../src/mpid/ch4/src/ch4_init.c:1293
#5 0x00007f788fea41a3 in MPIR_Init_thread (argc=0x1, argv=0x2b3bf, required=0, provided=0x2514e) at ../../src/mpi/init/initthread.c:142
#6 0x00007f788fea371b in PMPI_Init (argc=0x1, argv=0x2b3bf) at ../../src/mpi/init/init.c:140
#7 0x00007f789129b85b in pmpi_init_ (ierr=0x1) at ../../src/binding/fortran/mpif_h/initf.c:275
#8 0x000000000041201f in test () at ./flc/test.F90:10
#9 0x0000000000405ce2 in main ()
#10 0x00007f788eca4555 in __libc_start_main () from /lib64/libc.so.6
#11 0x0000000000405be3 in _start ()
(gdb)
I compile a simulation application in intel hpc container built by below:
Then run cmd:
docker run --rm -i -t --name gtc_worker --shm-size=64gb XXX/image:tag bash
, where--shm-size=
is used to solve Bus error.The app works well when the grids are small, like 50x300, but it crashes when grids are 50x310.
After set
I_MPI_DEBUG=10
, I get some info:Maybe changing this part
shm segment size (128 MB per rank)
will solve the issue??? So how can I do that?