sandialabs / seacas

The Sandia Engineering Analysis Code Access System (SEACAS) is a suite of preprocessing, postprocessing, translation, and utility applications supporting finite element analysis software using the Exodus database file format.
Other
136 stars 79 forks source link

ex_close fails if file is mode is EX_READ or EX_CLOBBER in parallel #467

Open bourdin opened 4 months ago

bourdin commented 4 months ago

This sequence (trivial modification of testrd_par.c):

exoid = ex_open_par("test.exo",        /* filename path */
                          EX_READ,           /* access mode = READ */
                          &CPU_word_size, /* CPU word size */
                          &IO_word_size,    /* IO word size */
                          &version,       /* ExodusII library version */
                          mpi_comm, mpi_info);
  error = ex_close(exoid);

produces the following error:

Exodus Library Warning/Error: [ex_close] in file 'test.exo'
        ERROR: failed to close file id 65536
        NetCDF: Write to read only

If the file is open with ex_open, the code runs fine

gsjaardema commented 3 months ago

I am able to run this with no errors on 1, 2, 4, 8 ranks. Can you provide more informataion as to what version you are using, how you are compiling/running...

bourdin commented 3 months ago

Strange. I get this error for any processor count. The exodus libraries are compiled by PETSc My initial test was with v2022-08-01, but I get the same error with the most recent tag release. I am running on a ARM mac with gcc-14 and mpich from homebrew. The configure command for exodus is

-DCMAKE_INSTALL_PREFIX=/opt/HPC/petsc-sarah/sonoma-gcc14.1-arm64-g -DCMAKE_INSTALL_NAME_DIR:STRING="/opt/HPC/petsc-sarah/sonoma-gcc14.1-arm64-g/lib" -DCMAKE_INSTALL_LIBDIR:STRING="lib" -DCMAKE_VERBOSE_MAKEFILE=1 -DCMAKE_BUILD_TYPE=Debug -DCMAKE_AR="/usr/bin/ar" -DCMAKE_C_COMPILER="mpicc" -DMPI_C_COMPILER="/opt/homebrew/bin/mpicc" -DCMAKE_RANLIB=/usr/bin/ranlib -DCMAKE_C_FLAGS:STRING="-Wimplicit-function-declaration -Wunused -Wuninitialized -fPIC -g -O0" -DCMAKE_C_FLAGS_DEBUG:STRING="-Wimplicit-function-declaration -Wunused -Wuninitialized -fPIC -g -O0" -DCMAKE_C_FLAGS_RELEASE:STRING="-Wimplicit-function-declaration -Wunused -Wuninitialized -fPIC -g -O0" -DCMAKE_CXX_COMPILER="mpicxx" -DMPI_CXX_COMPILER="/opt/homebrew/bin/mpicxx" -DCMAKE_CXX_FLAGS:STRING="-fno-stack-check -g -O0 -fPIC" -DCMAKE_CXX_FLAGS_DEBUG:STRING="-fno-stack-check -g -O0 -fPIC" -DCMAKE_CXX_FLAGS_RELEASE:STRING="-fno-stack-check -g -O0 -fPIC" -DCMAKE_Fortran_COMPILER="mpif90" -DMPI_Fortran_COMPILER="/opt/homebrew/bin/mpif90" -DCMAKE_Fortran_FLAGS:STRING="-ffree-line-length-none -fallow-argument-mismatch -Wunused -Wuninitialized -fPIC -g -O0 -fallow-argument-mismatch" -DCMAKE_Fortran_FLAGS_DEBUG:STRING="-ffree-line-length-none -fallow-argument-mismatch -Wunused -Wuninitialized -fPIC -g -O0 -fallow-argument-mismatch" -DCMAKE_Fortran_FLAGS_RELEASE:STRING="-ffree-line-length-none -fallow-argument-mismatch -Wunused -Wuninitialized -fPIC -g -O0 -fallow-argument-mismatch" -DBUILD_SHARED_LIBS:BOOL=ON -DBUILD_STATIC_LIBS:BOOL=OFF -DPYTHON_EXECUTABLE:PATH=/opt/homebrew/opt/python@3.12/bin/python3.12 -DPythonInterp_FIND_VERSION:STRING=3.12 -DACCESSDIR:PATH=/opt/HPC/petsc-sarah/sonoma-gcc14.1-arm64-g -DCMAKE_INSTALL_RPATH:PATH=/opt/HPC/petsc-sarah/sonoma-gcc14.1-arm64-g/lib -DSeacas_ENABLE_SEACASExodus:BOOL=ON -DSeacas_ENABLE_Fortran:BOOL=ON -DSeacas_ENABLE_SEACASExoIIv2for32:BOOL=ON -DSeacas_ENABLE_SEACASExoIIv2for:BOOL=ON -DSeacas_ENABLE_SEACASExodus_for:BOOL=ON -DSEACASProj_SKIP_FORTRANCINTERFACE_VERIFY_TEST:BOOL=ON -DSeacas_ENABLE_SEACASExodiff:BOOL=OFF -DSeacas_ENABLE_SEACASExotxt:BOOL=OFF -DTPL_ENABLE_Matio:BOOL=OFF -DTPL_ENABLE_Netcdf:BOOL=ON -DTPL_ENABLE_Pnetcdf:BOOL=ON -DTPL_Netcdf_Enables_PNetcdf:BOOL=ON -DTPL_ENABLE_MPI:BOOL=ON -DTPL_ENABLE_Pamgen:BOOL=OFF -DTPL_ENABLE_CGNS:BOOL=OFF -DTPL_ENABLE_fmt=OFF -DNetCDF_DIR:PATH=/opt/HPC/petsc-sarah/sonoma-gcc14.1-arm64-g -DHDF5_DIR:PATH=/opt/HPC/petsc-sarah/sonoma-gcc14.1-arm64-g -DPnetcdf_LIBRARY_DIRS:PATH=/opt/HPC/petsc-sarah/sonoma-gcc14.1-arm64-g/lib -DPnetcdf_INCLUDE_DIRS:PATH=/opt/HPC/petsc-sarah/sonoma-gcc14.1-arm64-g/include -DSEACASExodus_ENABLE_SHARED:BOOL=ON -DCMAKE_SHARED_LINKER_FLAGS:STRING="-Wl,-rpath,/opt/HPC/petsc-sarah/sonoma-gcc14.1-arm64-g/lib -L/opt/HPC/petsc-sarah/sonoma-gcc14.1-arm64-g/lib -lnetcdf -Wl,-rpath,/opt/HPC/petsc-sarah/sonoma-gcc14.1-arm64-g/lib -L/opt/HPC/petsc-sarah/sonoma-gcc14.1-arm64-g/lib -lpnetcdf -Wl,-rpath,/opt/homebrew/Cellar/mpich/4.2.1/lib -Wl,-rpath,/opt/homebrew/Cellar/mpich/4.2.1/lib -L/opt/homebrew/Cellar/mpich/4.2.1/lib -lmpifort -lmpi -lpmpi -lgfortran -Wl,-rpath,/opt/homebrew/Cellar/gcc/14.1.0_1/lib/gcc/current/gcc/aarch64-apple-darwin23/14 -Wl,-rpath,/opt/homebrew/Cellar/gcc/14.1.0_1/lib/gcc/current/gcc/aarch64-apple-darwin23/14 -L/opt/homebrew/Cellar/gcc/14.1.0_1/lib/gcc/current/gcc/aarch64-apple-darwin23/14 -Wl,-rpath,/opt/homebrew/Cellar/gcc/14.1.0_1/lib/gcc/current/gcc -Wl,-rpath,/opt/homebrew/Cellar/gcc/14.1.0_1/lib/gcc/current/gcc -L/opt/homebrew/Cellar/gcc/14.1.0_1/lib/gcc/current/gcc -Wl,-rpath,/opt/homebrew/Cellar/gcc/14.1.0_1/lib/gcc/current -Wl,-rpath,/opt/homebrew/Cellar/gcc/14.1.0_1/lib/gcc/current -L/opt/homebrew/Cellar/gcc/14.1.0_1/lib/gcc/current -lemutls_w -lheapt_w -lgfortran -lquadmath -Wl,-rpath,/opt/homebrew/Cellar/gcc/14.1.0_1/lib/gcc/current/gcc/aarch64-apple-darwin23/14 -Wl,-rpath,/opt/homebrew/Cellar/gcc/14.1.0_1/lib/gcc/current/gcc -Wl,-rpath,/opt/homebrew/Cellar/gcc/14.1.0_1/lib/gcc/current -Wl,-rpath,/opt/HPC/petsc-sarah/sonoma-gcc14.1-arm64-g/lib -L/opt/HPC/petsc-sarah/sonoma-gcc14.1-arm64-g/lib -lhdf5_hl -lhdf5 -Wl,-rpath,/opt/HPC/petsc-sarah/sonoma-gcc14.1-arm64-g/lib -L/opt/HPC/petsc-sarah/sonoma-gcc14.1-arm64-g/lib -lz "

and the build command is

mpicc -Wl,-search_paths_first -Wl,-no_compact_unwind -Wl,-no_warn_duplicate_libraries -Wimplicit-function-declaration -Wunused -Wuninitialized -fPIC -g3 -O0  -I/opt/HPC/petsc-sarah/include -I/opt/HPC/petsc-sarah/sonoma-gcc14.1-arm64-g/include -I/opt/X11/include      test_close.c  -Wl,-rpath,/opt/HPC/petsc-sarah/sonoma-gcc14.1-arm64-g/lib -L/opt/HPC/petsc-sarah/sonoma-gcc14.1-arm64-g/lib -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -Wl,-rpath,/opt/homebrew/Cellar/mpich/4.2.1/lib -L/opt/homebrew/Cellar/mpich/4.2.1/lib -Wl,-rpath,/opt/homebrew/Cellar/gcc/14.1.0_1/lib/gcc/current/gcc/aarch64-apple-darwin23/14 -L/opt/homebrew/Cellar/gcc/14.1.0_1/lib/gcc/current/gcc/aarch64-apple-darwin23/14 -Wl,-rpath,/opt/homebrew/Cellar/gcc/14.1.0_1/lib/gcc/current/gcc -L/opt/homebrew/Cellar/gcc/14.1.0_1/lib/gcc/current/gcc -Wl,-rpath,/opt/homebrew/Cellar/gcc/14.1.0_1/lib/gcc/current -L/opt/homebrew/Cellar/gcc/14.1.0_1/lib/gcc/current -lpetsc -llapack -lblas -lexoIIv2for32 -lexodus -lnetcdf -lpnetcdf -lhdf5_hl -lhdf5 -lz -lX11 -lmpifort -lmpi -lpmpi -lgfortran -lemutls_w -lheapt_w -lgfortran -lquadmath -lc++ -o test_close

The full listing of my example is

#include "exodusII.h"
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char **argv)
{
  MPI_Comm mpi_comm = MPI_COMM_WORLD;
  MPI_Info mpi_info = MPI_INFO_NULL;

  float  version;

  int CPU_word_size = 0; /* sizeof(float) */
  int IO_word_size  = 0; /* use what is stored in file */
  int exoid,error;

  ex_opts(EX_VERBOSE | EX_ABORT);

  /* Initialize MPI. */
  MPI_Init(&argc, &argv);

  exoid = ex_open("test.exo",                /* filename path */
                          EX_READ,           /* access mode = READ */
                          &CPU_word_size, /* CPU word size */
                          &IO_word_size,    /* IO word size */
                          &version);      /* ExodusII library version */

  error = ex_close(exoid);
  printf("\nafter ex_close, error = %3d\n", error);

  /* open EXODUS II files */
  exoid = ex_open_par("test.exo",        /* filename path */
                          EX_READ,           /* access mode = READ */
                          &CPU_word_size, /* CPU word size */
                          &IO_word_size,    /* IO word size */
                          &version,       /* ExodusII library version */
                          mpi_comm, mpi_info);
  error = ex_close(exoid);
  printf("\nafter ex_close, error = %3d\n", error);

  MPI_Finalize();
  return 0;
}
gsjaardema commented 3 months ago

Not sure what is wrong. I compiled and ran the code shown above and get this:

13:50 $ mpicc -I../include -L../lib test.c -lexodus -Wl,-rpath ../lib -o test-close

13:50 $ mpiexec -np 4 ./test-close
after ex_close, error =   0
after ex_close, error =   0
after ex_close, error =   0
after ex_close, error =   0
after ex_close, error =   0
after ex_close, error =   0
after ex_close, error =   0
after ex_close, error =   0

✔ ~/src/seacas-parallel/build [master {origin/master}|✚ ⚑ ]
gsjaardema commented 3 months ago

I don't know if any of the extra libraries are conflicting somehow... You seem to be adding in some fortran-releated exodus libraries and X11 which aren't needed...

-I/opt/X11/include 
-Wl,-rpath,/opt/X11/lib 
-L/opt/X11/lib 
-lpetsc 
-llapack 
-lblas 
-lexoIIv2for32 
-lX11
-lmpifort 
-lgfortran 
-lemutls_w 
-lheapt_w 
-lgfortran 
-lquadmath 
-lc++ 

Those might be needed for your application in general and you are just trying to get a small example to show the bug you are seeing...

gsjaardema commented 3 months ago

I'm not sure what else to suggest that you try for this. Do the seacas tests work, or are they not being built...

bourdin commented 3 months ago

I am quite lost here...

I have upgraded exodus and pnetcdf to their latest version. I rebuilt exodusII with tests and they all pass, except for the python ones. As far as I can see, however, none of the tests cover ex_open_par Here is what I added to my cmake comand: -DSEACASExodus_ENABLE_TESTS:BOOL=ON -DSeacas_ENABLE_TESTS:BOOL=ON

I also removed all extra libraries and compiled with mpicc -I ${PETSC_DIR}/${PETSC_ARCH}/include/ -Wl,-rpath,${PETSC_DIR}/${PETSC_ARCH}/lib -L${PETSC_DIR}/${PETSC_ARCH}/lib -lexodus -lnetcdf -lpnetcdf -lhdf5_hl -lhdf5 -lz -o testclose testclose.c

I can reproduce this behaviour on a macOS and a linux box.

gsjaardema commented 2 months ago

Not ignoring this issue, but I have not yet been able to reproduce the behavior...

bourdin commented 2 months ago

I also have not had a chance to get into this.