Closed gthyagi closed 7 months ago
This is my fault. Fixing now
Here is the fix. Once I make a test, I will get the MR merged.
@knepley I got this error with the latest main branch of Petsc.
[4]PETSC ERROR: ------------------------------------------------------------------------
[4]PETSC ERROR: [96]PETSC ERROR: [288]PETSC ERROR: ------------------------------------------------------------------------
[288]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[288]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[288]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
[288]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[288]PETSC ERROR: to get more information on the crash.
[288]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
------------------------------------------------------------------------
Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[4]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[4]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
[4]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[4]PETSC ERROR: to get more information on the crash.
[96]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[96]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[4]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
[160]PETSC ERROR: [96]PETSC ERROR: [224]PETSC ERROR: ------------------------------------------------------------------------
[160]PETSC ERROR: ------------------------------------------------------------------------
[224]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[96]PETSC ERROR: [160]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
Try option -start_in_debugger or -on_error_attach_debugger
configure using --with-debugging=yes, recompile, link, and run
[224]PETSC ERROR: [96]PETSC ERROR: [160]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
to get more information on the crash.
or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
[96]PETSC ERROR: [160]PETSC ERROR: [224]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
Run with -malloc_debug to check if memory corruption is causing the crash.
configure using --with-debugging=yes, recompile, link, and run
[224]PETSC ERROR: [160]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
to get more information on the crash.
[224]PETSC ERROR: to get more information on the crash.
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 288 in communicator MPI_COMM_WORLD
with errorcode 59.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
[160]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
[224]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
[gadi-cpu-clx-0154.gadi.nci.org.au:06772] PMIX ERROR: UNREACHABLE in file /jobfs/53639599.gadi-pbs/0/openmpi/4.1.4/source/openmpi-4.1.4/opal/mca/pmix/pmix3x/pmix/src/server/pmix_server.c at line 2198
[800]PETSC ERROR: ------------------------------------------------------------------------
[800]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[800]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[800]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
[800]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[800]PETSC ERROR: to get more information on the crash.
[800]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
[gadi-cpu-clx-0154.gadi.nci.org.au:06772] 5 more processes have sent help message help-mpi-api.txt / mpi-abort
[gadi-cpu-clx-0154.gadi.nci.org.au:06772] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
I will recompile with --with-debugging=yes
and try to see what is happening.
I recompiled with --with-debugging=yes
then the following error occurs. Also attached logfile.
[78]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[78]PETSC ERROR: Object is in wrong state
[78]PETSC ERROR: Difference in cached 2 norms: local 71184.9
fixed in petsc3.21.0
Hi @knepley,
I am trying to run Spherical Benchmarks: Isoviscous Incompressible Stokes from Benchmark paper. I was able to run these models upto
cellsize=1/32
on 64 cpus. However, if I run the same job withcellsize=1/64
on 528 cpus then following error occursHere are the input and log files. Ex_Stokes_Spherical_Benchmark_Kramer_RCS.py.txt jobscript_m18.sh.e111716764.txt jobscript_m18.sh.o111716764.txt