openPMD / openPMD-api

:floppy_disk: C++ & Python API for Scientific I/O
https://openpmd-api.readthedocs.io
GNU Lesser General Public License v3.0
138 stars 51 forks source link

ADIOS 1 interfering with WarpX mpi io #1195

Open RTSandberg opened 2 years ago

RTSandberg commented 2 years ago

If I run WarpX locally on my machine with MPI enabled, then I get a segfault that is traced to the presence of ADIOS1 associated with an openpmd_api installation

To Reproduce

# fails with:
cmake -S . -B build -DWarpX_DIMS=RZ -DCMAKE_BUILD_TYPE=Debug
cmake --build build
cd build/bin
./warpx ../../Examples/Physics_applications/laser_acceleration/inputs_rz amrex.throw_exception = 1 amrex.signal_handling = 0

Configuration output:

(warpx-dev) ryansand@m-krasny05 WarpX % cmake -S . -B build -DWarpX_DIMS=RZ -DCMAKE_BUILD_TYPE=Debug
-- Found CCache: /opt/anaconda3/envs/warpx-dev/bin/ccache
-- Downloading AMReX ...
-- AMReX repository: https://github.com/AMReX-Codes/amrex.git (22.02)
-- CMake version: 3.21.3
-- AMReX installation directory: /usr/local
-- Build type set by user to 'Debug'.
-- Building AMReX with AMReX_SPACEDIM = 2
-- Configuring AMReX with the following options enabled: 
--    AMReX_PRECISION = DOUBLE
--    AMReX_MPI
--    AMReX_MPI_THREAD_MULTIPLE
--    AMReX_OMP
--    AMReX_LINEAR_SOLVERS
--    AMReX_PARTICLES
--    AMReX_PARTICLES_PRECISION = DOUBLE
--    AMReX_TINY_PROFILE
-- Found MPI: TRUE (found version "3.1") found components: C CXX 
-- AMReX configuration summary: 
--    Build type               = Debug
--    Install directory        = /usr/local
--    C++ compiler             = /opt/anaconda3/envs/warpx-dev/bin/x86_64-apple-darwin13.4.0-clang++
--    C++ defines              = 
--    C++ flags                = -g -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -stdlib=libc++ -fvisibility-inlines-hidden -std=c++14 -fmessage-length=0 -isystem /opt/anaconda3/envs/warpx-dev/include -fopenmp=libomp
--    C++ include paths        = -I/Users/ryansand/Documents/plasma_codes/WarpX/WarpX/build/_deps/fetchedamrex-src/Src/Base -I/Users/ryansand/Documents/plasma_codes/WarpX/WarpX/build/_deps/fetchedamrex-src/Src/Base/Parser -I/Users/ryansand/Documents/plasma_codes/WarpX/WarpX/build/_deps/fetchedamrex-src/Src/Boundary -I/Users/ryansand/Documents/plasma_codes/WarpX/WarpX/build/_deps/fetchedamrex-src/Src/AmrCore -I/Users/ryansand/Documents/plasma_codes/WarpX/WarpX/build/_deps/fetchedamrex-src/Src/LinearSolvers/MLMG -I/Users/ryansand/Documents/plasma_codes/WarpX/WarpX/build/_deps/fetchedamrex-src/Src/Particle -I/opt/anaconda3/envs/warpx-dev/include
--    Link line                = /opt/anaconda3/envs/warpx-dev/lib/libmpi.dylib /opt/anaconda3/envs/warpx-dev/lib/libomp.dylib
-- AMReX: Using version '22.02' (22.02)
-- Downloading PICSAR ...
-- PICSAR repository: https://github.com/ECP-WarpX/picsar.git (15651b072cd9a45a5a5061d8cf7b928d136e39f3)
-- Downloading openPMD-api ...
-- openPMD-api repository: https://github.com/openPMD/openPMD-api.git (0.14.3)
-- Found MPI: TRUE (found version "3.1") found components: CXX 
-- Using the single-header code from /Users/ryansand/Documents/plasma_codes/WarpX/WarpX/build/_deps/fetchedopenpmd-src/share/openPMD/thirdParty/json/single_include/
-- nlohmann-json: Using INTERNAL version '3.9.1'
-- HDF5 C compiler wrapper is unable to compile a minimal HDF5 program.
CMake Warning at /opt/anaconda3/envs/warpx-dev/share/cmake-3.21/Modules/FindHDF5.cmake:742 (message):
  HDF5 found for language C is not parallel but previously found language is
  parallel.
Call Stack (most recent call first):
  build/_deps/fetchedopenpmd-src/CMakeLists.txt:192 (find_package)
-- Found 'adios_config': /opt/anaconda3/envs/warpx-dev/bin/adios_config
-- ADIOS linker flags (unparsed): -L/opt/anaconda3/envs/warpx-dev/lib -ladios -L/opt/anaconda3/envs/warpx-dev/lib64 -L/opt/anaconda3/envs/warpx-dev/lib64 -L/opt/anaconda3/envs/warpx-dev/lib -lz -lbz2 -lblosc -Wl,-pie -Wl,-headerpad_max_install_names -Wl,-dead_strip_dylibs -Wl,-rpath,/opt/anaconda3/envs/warpx-dev/lib -L/opt/anaconda3/envs/warpx-dev/lib -Wl,-rpath,/opt/anaconda3/envs/warpx-dev/lib
-- ADIOS compiler flags (unparsed): -I/opt/anaconda3/envs/warpx-dev/include -DZLIB -I/opt/anaconda3/envs/warpx-dev/include -DBZIP2 -I/opt/anaconda3/envs/warpx-dev/include -DBLOSC -I/opt/anaconda3/envs/warpx-dev/include -I/opt/anaconda3/envs/warpx-dev/include
-- ADIOS DIRS to look for libs: /opt/anaconda3/envs/warpx-dev/lib;/opt/anaconda3/envs/warpx-dev/lib64;/opt/anaconda3/envs/warpx-dev/lib64;/opt/anaconda3/envs/warpx-dev/lib;/opt/anaconda3/envs/warpx-dev/lib
-- Found adios in /opt/anaconda3/envs/warpx-dev/lib/libadios.a
-- Found z in /opt/anaconda3/envs/warpx-dev/lib/libz.dylib
-- Found bz2 in /opt/anaconda3/envs/warpx-dev/lib/libbz2.dylib
-- Found blosc in /opt/anaconda3/envs/warpx-dev/lib/libblosc.dylib
-- ADIOS compile definitions: -DZLIB -DBZIP2 -DBLOSC
-- Found MPI: TRUE (found version "3.1")  
-- <variant> supported (C++17 or newer): TRUE
openPMD build configuration:
  library Version: 0.14.3
  openPMD Standard: 1.1.0
  C++ Compiler: Clang 11.1.0 
    /opt/anaconda3/envs/warpx-dev/bin/x86_64-apple-darwin13.4.0-clang++

  Installation: OFF

  Build Type: Debug
  Library: static
  CLI Tools: OFF
  Examples: OFF
  Testing: OFF
  Invasive Tests: OFF
  Internal VERIFY: ON
  Build Options:
    MPI: ON
    HDF5: ON
    ADIOS1: ON
    ADIOS2: ON
    PYTHON: OFF

WarpX build configuration:
  Version: 22.02 (22.02-3-ge7c7d3f2bb85)
  C++ Compiler: Clang 11.1.0 
    /opt/anaconda3/envs/warpx-dev/bin/x86_64-apple-darwin13.4.0-clang++

  Installation prefix: /usr/local
        bin: bin
        lib: lib
    include: include
      cmake: lib/cmake/WarpX

  Build type: Debug
  Build options:
    APP: ON
    ASCENT: OFF
    COMPUTE: OMP
    DIMS: RZ
    Embedded Boundary: OFF
    GPU clock timers: OFF
    IPO/LTO: OFF
    LIB: OFF
    MPI: ON
    PSATD: OFF
    PRECISION: DOUBLE
    OPENPMD: ON
    QED: ON
    QED table generation: OFF
    SENSEI: OFF

-- Configuring done
-- Generating done
-- Build files have been written to: <WarpX root>/WarpX/build

Output:

MPI initialized with 1 MPI processes
MPI initialized with thread support level 3
OMP initialized with 16 OMP threads
AMReX (22.02) initialized
WarpX (22.02-3-ge7c7d3f2bb85)
PICSAR (15651b072cd9)
Level 0: dt = 4.112304655e-16 ; dx = 4.6875e-07 ; dz = 1.328125e-07

Grids Summary:
  Level 0   8 grids  32768 cells  100 % of domain
            smallest grid: 64 x 64  biggest grid: 64 x 64

  Writing plotfile diags/diag100000

STEP 1 starts ...
...

STEP 10 starts ...
  Writing plotfile diags/diag100010
STEP 10 ends. TIME = 4.112304655e-15 DT = 4.112304655e-16
Evolve time = 0.562361577 s; This step = 0.160967616 s; Avg. per step = 0.0562361577 s

**** WARNINGS ******************************************************************
* GLOBAL warning list  after  [ THE END ]
*
* No recorded warnings.
********************************************************************************

Total Time                     : 0.75111822
[m-krasny05:49202] *** Process received signal ***
[m-krasny05:49202] Signal: Segmentation fault: 11 (11)
[m-krasny05:49202] Signal code: Address not mapped (1)
[m-krasny05:49202] Failing at address: 0x1
[m-krasny05:49202] [ 0] 0   libsystem_platform.dylib            0x00007ff80d1d2e2d _sigtramp + 29
[m-krasny05:49202] [ 1] 0   ???                                 0x0000000000000002 0x0 + 2
[m-krasny05:49202] [ 2] 0   libopenPMD.ADIOS1.Serial.dylib      0x000000011030e062 MPI_Allreduce + 114
[m-krasny05:49202] [ 3] 0   warpx.RZ.MPI.OMP.DP.OPMD.QED.DEBUG  0x000000010f661bdc _ZN5amrex18ParallelDescriptor13ReduceBoolAndERb + 76
[m-krasny05:49202] [ 4] 0   warpx.RZ.MPI.OMP.DP.OPMD.QED.DEBUG  0x000000010f71b430 _ZN5amrex12TinyProfiler8FinalizeEb + 160
[m-krasny05:49202] [ 5] 0   warpx.RZ.MPI.OMP.DP.OPMD.QED.DEBUG  0x000000010f625b18 _ZN5amrex8FinalizeEPNS_5AMReXE + 40
[m-krasny05:49202] [ 6] 0   warpx.RZ.MPI.OMP.DP.OPMD.QED.DEBUG  0x000000010f2def42 main + 498
[m-krasny05:49202] [ 7] 0   dyld                                0x000000011dfbc4fe start + 462
[m-krasny05:49202] *** End of error message ***
zsh: segmentation fault  ./warpx ../../Examples/Physics_applications/laser_acceleration/inputs_rz  = 1

Expected behavior If ADIOS1 is explicitly disabled,

cmake -S . -B build -DWarpX_DIMS=RZ -DCMAKE_BUILD_TYPE=Debug -DopenPMD_USE_ADIOS1=OFF

then this works.

Note that HDF5 and ADIOS2 are found, so this seems to be an MPI shutdown issue in ADIOS1 even when it is not used.

Software Environment

ax3l commented 2 years ago

Looks to me like for some reason, the symbol hiding in libopenPMD.ADIOS1.Serial.dylib does not work and AMReX picks up an MPI mock stub implementation in ADIOS1 (serial) instead of using the external MPI from conda-forge.

ax3l commented 2 years ago

We are testing a fix in #1196