scalable-matrix / CA3DMM

Communication-Avoiding 3D Matrix Multiplication
MIT License
7 stars 0 forks source link

Segmentation fault #8

Open kicurry opened 10 months ago

kicurry commented 10 months ago

Brief

A segmentation fault occurred when attempting to reproduce certain PGEMM test samples from the CA3D paper using example_AB.exe. Currently, I am unable to ascertain whether this error is associated with a specific environment.

Compilation and Execution

Compilation

Compile according to README.md, that is, by using the command make -f icc-mkl-anympi.make -j

I tried my best to restore the compilation information from the binary file as follows,

$ strings example_AB.exe | grep -i -B 2 example_AB.c.o
example_AB.c
Intel(R) C Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.1.1.217 Build 20200306
-I../include -I/opt/app/openmpi/4.0.4/intel/2020/include -Wall -g -std=gnu11 -O3 -fPIC -DUSE_MKL -fopenmp -xHost -mkl -c -o example_AB.c.o -pthread -Wl,-rpath,/home/$USERNAME/intel/oneapi/mkl/2023.2.0/lib/intel64

Execution

Submitting jobs to run on the slurm cluster via sbatch scripts(see below). Error occurred when executing example_AB.exe with the following parameters,

$ ./example_AB.exe 1200000 6000 6000 0 0 1 1 0

The same error occurred when M=N=6000, K=1200000. But for M=N=K=50,000 and M=N=100,000,K=5,000, it worked fine.

SBATCH Script

#!/bin/bash
#SBATCH --job-name=CA3D      # Job name
#SBATCH --output=CA3D-1200000-6000-6000-16-%j.out # Stdout (%j expands to jobId)
#SBATCH --error=CA3D-1200000-6000-6000-16-%j.err  # Stderr (%j expands to jobId)
#SBATCH --partition=cpu
#SBATCH --nodes=16                   # Number of nodes requested
#SBATCH --ntasks=16
#SBATCH --cpus-per-task=52
#SBATCH --time=01:00:00             # walltime

export MPIRUN_OPTIONS="--bind-to none -report-bindings"
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export NUM_CORES=${SLURM_NTASKS}*${SLURM_CPUS_PER_TASK}

module load app/intel/2020
module load mpi/openmpi/4.0.4/intel

EXEC_PATH=$USER/CA3DMM/examples/example_AB.exe
mpirun -n ${SLURM_NTASKS} ${MPIRUN_OPTIONS} ${EXEC_PATH} 1200000 6000 6000 0 0 1 1 0

By the way, while testing with M=N=K=32768 using the same sbatch script, unfortunately, the job did not complete within 15 minutes. Regrettably, I had to forcefully cancel the job as it incurs charges, and consequently, I couldn't obtain any useful log information.

GDB Core Dumps

The method of reproducing errors as described in the Execution section.

It shows that a segment fault occurs when mat_redist_engine_exec calls MPI_Neighbor_alltoallv for redistributing matrices A and B. Entering mat_redist_engine_exec to view MPI_Neighbor_alltoallv's parameter sendbuf_h shows "Address 0x1462ced84e50 out of bounds"

Core was generated by `$USER/CA3DMM/examples/example_AB.exe 1200000 6000 6000 0 0 1 1 0'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000000041e2e4 in __intel_avx_rep_memcpy ()
(gdb) backtrace 
#0  0x000000000041e2e4 in __intel_avx_rep_memcpy ()
#1  0x00001469ca7de08f in mca_btl_self_get ()
   from /opt/app/openmpi/4.0.4/intel/2020/lib/openmpi/mca_btl_self.so
#2  0x00001469ca1aa174 in mca_pml_ob1_recv_request_get_frag ()
   from /opt/app/openmpi/4.0.4/intel/2020/lib/openmpi/mca_pml_ob1.so
#3  0x00001469ca1a9c5f in mca_pml_ob1_recv_request_progress_rget ()
   from /opt/app/openmpi/4.0.4/intel/2020/lib/openmpi/mca_pml_ob1.so
#4  0x00001469ca19e4aa in mca_pml_ob1_recv_frag_match_proc ()
   from /opt/app/openmpi/4.0.4/intel/2020/lib/openmpi/mca_pml_ob1.so
#5  0x00001469ca1a08f9 in mca_pml_ob1_recv_frag_callback_rget ()
   from /opt/app/openmpi/4.0.4/intel/2020/lib/openmpi/mca_pml_ob1.so
#6  0x00001469ca7dda30 in mca_btl_self_send ()
   from /opt/app/openmpi/4.0.4/intel/2020/lib/openmpi/mca_btl_self.so
#7  0x00001469ca1acf81 in mca_pml_ob1_send_request_start_rdma ()
   from /opt/app/openmpi/4.0.4/intel/2020/lib/openmpi/mca_pml_ob1.so
#8  0x00001469ca19aa81 in mca_pml_ob1_isend ()
   from /opt/app/openmpi/4.0.4/intel/2020/lib/openmpi/mca_pml_ob1.so
#9  0x00001469c935a3a0 in mca_coll_basic_neighbor_alltoallv ()
   from /opt/app/openmpi/4.0.4/intel/2020/lib/openmpi/mca_coll_basic.so
#10 0x00001469d7f82b1d in PMPI_Neighbor_alltoallv ()
   from /opt/app/openmpi/4.0.4/intel/2020/lib/libmpi.so.40
#11 0x000000000040ad2c in mat_redist_engine_exec (engine=0x1462ced84e50, src_blk=0x1462f844aa50, 
    src_ld=-129717664, dst_blk=0xd693a30, dst_ld=6769816) at mat_redist.c:357
#12 0x0000000000406397 in ca3dmm_engine_exec (engine=0x1462ced84e50, src_A=0x1462f844aa50, 
    ldA=-129717664, src_B=0xd693a30, ldB=6769816, dst_C=0x41e2e0 <__intel_avx_rep_memcpy+672>, 
    ldC=1200000) at ca3dmm.c:988
#13 0x0000000000404e3e in main (argc=375, argv=0x0) at example_AB.c:169
(gdb) frame 11
#11 0x000000000040ad2c in mat_redist_engine_exec (engine=0x1462ced84e50, src_blk=0x1462f844aa50, 
    src_ld=-129717664, dst_blk=0xd693a30, dst_ld=6769816) at mat_redist.c:357
357             MPI_Neighbor_alltoallv(
(gdb) l
352         int  *recv_displs = engine->recv_displs;
353         void *recvbuf_h   = engine->recvbuf_h;
354         void *recvbuf_d   = engine->recvbuf_d;
355         if (dev_type == DEV_TYPE_HOST)
356         {
357             MPI_Neighbor_alltoallv(
358                 sendbuf_h, send_sizes, send_displs, engine->dtype, 
359                 recvbuf_h, recv_sizes, recv_displs, engine->dtype, engine->graph_comm
360             );
361         }
(gdb) p sendbuf_h 
$1 = 0x1462ced84e50 <Address 0x1462ced84e50 out of bounds>
(gdb) p recvbuf_h
$2 = (void *) 0x1462c16f1410

Check Dynamic Library

Replaced some sensitive information, such as the home directory with username.

$ ldd example_AB.exe 
        linux-vdso.so.1 =>  (0x00007fffb21a3000)
        libmkl_intel_lp64.so.2 => $USER/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_intel_lp64.so.2 (0x00001518454c0000)
        libmkl_intel_thread.so.2 => $USER/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_intel_thread.so.2 (0x0000151841e94000)
        libmkl_core.so.2 => $USER/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_core.so.2 (0x000015183db1c000)
        libiomp5.so => /opt/intel/compilers_and_libraries_2020.1.217/linux/compiler/lib/intel64_lin/libiomp5.so (0x000015183d71c000)
        libmpi.so.40 => /opt/app/openmpi/4.0.4/intel/2020/lib/libmpi.so.40 (0x000015183d3da000)
        libm.so.6 => /lib64/libm.so.6 (0x000015183d0d8000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000015183cec2000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x000015183cca6000)
        libc.so.6 => /lib64/libc.so.6 (0x000015183c8d8000)
        libdl.so.2 => /lib64/libdl.so.2 (0x000015183c6d4000)
        /lib64/ld-linux-x86-64.so.2 (0x00001518469bc000)
        libopen-rte.so.40 => /opt/app/openmpi/4.0.4/intel/2020/lib/libopen-rte.so.40 (0x000015183c40e000)
        libopen-pal.so.40 => /opt/app/openmpi/4.0.4/intel/2020/lib/libopen-pal.so.40 (0x000015183c0ce000)
        librt.so.1 => /lib64/librt.so.1 (0x000015183bec6000)
        libutil.so.1 => /lib64/libutil.so.1 (0x000015183bcc3000)
        libz.so.1 => /lib64/libz.so.1 (0x000015183baad000)
        libimf.so => /opt/intel/compilers_and_libraries_2020.1.217/linux/compiler/lib/intel64_lin/libimf.so (0x000015183b42a000)
        libsvml.so => /opt/intel/compilers_and_libraries_2020.1.217/linux/compiler/lib/intel64_lin/libsvml.so (0x0000151839978000)
        libirng.so => /opt/intel/compilers_and_libraries_2020.1.217/linux/compiler/lib/intel64_lin/libirng.so (0x000015183960e000)
        libintlc.so.5 => /opt/intel/compilers_and_libraries_2020.1.217/linux/compiler/lib/intel64_lin/libintlc.so.5 (0x0000151839396000)

My environment

More detailed information about the dependencies used that may be useful.

  1. CPU: Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
    $ lscpu
    Architecture:          x86_64
    CPU op-mode(s):        32-bit, 64-bit
    Byte Order:            Little Endian
    CPU(s):                52
    On-line CPU(s) list:   0-51
    Thread(s) per core:    1
    Core(s) per socket:    26
    Socket(s):             2
    NUMA node(s):          2
    Vendor ID:             GenuineIntel
    CPU family:            6
    Model:                 85
    Model name:            Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
  2. Scheduler: slurm
    $ slurmctld -V
    slurm 19.05.3-2
  3. MPI: openmpi 4.0.4(compiled by icc)
    $ ompi_info 
                 Package: Open MPI hpctest@n6 Distribution
                Open MPI: 4.0.4
    Open MPI repo revision: v4.0.4
    Open MPI release date: Jun 10, 2020
                Open RTE: 4.0.4
    Open RTE repo revision: v4.0.4
    Open RTE release date: Jun 10, 2020
                    OPAL: 4.0.4
      OPAL repo revision: v4.0.4
       OPAL release date: Jun 10, 2020
                 MPI API: 3.1.0
            Ident string: 4.0.4
                  Prefix: /opt/app/openmpi/4.0.4/intel/2020
    Configured architecture: x86_64-unknown-linux-gnu
          Configure host: n6
           Configured by: hpctest
           Configured on: Sat Aug 29 10:04:41 CST 2020
          Configure host: n6
    Configure command line: '--prefix=/opt/app/openmpi/4.0.4/intel/2020'
                Built by: hpctest
                Built on: Sat Aug 29 10:23:49 CST 2020
              Built host: n6
              C bindings: yes
            C++ bindings: no
             Fort mpif.h: yes (all)
            Fort use mpi: yes (full: ignore TKR)
       Fort use mpi size: deprecated-ompi-info-value
        Fort use mpi_f08: yes
    Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
                          limitations in the ifort compiler and/or Open MPI,
                          does not support the following: array subsections,
                          direct passthru (where possible) to underlying Open
                          MPI's C functionality
    Fort mpi_f08 subarrays: no
           Java bindings: no
    Wrapper compiler rpath: runpath
              C compiler: icc
     C compiler absolute: /opt/intel/compilers_and_libraries_2020.1.217/linux/bin/intel64/icc
    C compiler family name: INTEL
      C compiler version: 1910.20200306
            C++ compiler: g++
    C++ compiler absolute: /usr/bin/g++
           Fort compiler: ifort
       Fort compiler abs: /opt/intel/compilers_and_libraries_2020.1.217/linux/bin/intel64/ifort
         Fort ignore TKR: yes (!DEC$ ATTRIBUTES NO_ARG_CHECK ::)
    Fort 08 assumed shape: yes
      Fort optional args: yes
          Fort INTERFACE: yes
    Fort ISO_FORTRAN_ENV: yes
       Fort STORAGE_SIZE: yes
      Fort BIND(C) (all): yes
      Fort ISO_C_BINDING: yes
    Fort SUBROUTINE BIND(C): yes
       Fort TYPE,BIND(C): yes
    Fort T,BIND(C,name="a"): yes
            Fort PRIVATE: yes
          Fort PROTECTED: yes
           Fort ABSTRACT: yes
       Fort ASYNCHRONOUS: yes
          Fort PROCEDURE: yes
         Fort USE...ONLY: yes
           Fort C_FUNLOC: yes
    Fort f08 using wrappers: yes
         Fort MPI_SIZEOF: yes
             C profiling: yes
           C++ profiling: no
    Fort mpif.h profiling: yes
    Fort use mpi profiling: yes
    Fort use mpi_f08 prof: yes
          C++ exceptions: no
          Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
                          OMPI progress: no, ORTE progress: yes, Event lib:
                          yes)
           Sparse Groups: no
    Internal debug support: no
    MPI interface warnings: yes
     MPI parameter check: runtime
    Memory profiling support: no
    Memory debugging support: no
              dl support: yes
    Heterogeneous support: no
    mpirun default --prefix: no
       MPI_WTIME support: native
     Symbol vis. support: yes
    Host topology support: yes
            IPv6 support: no
      MPI1 compatibility: no
          MPI extensions: affinity, cuda, pcollreq
    FT Checkpoint support: no (checkpoint thread: no)
    C/R Enabled Debugging: no
    MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024
    MPI_MAX_DATAREP_STRING: 128
           MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v4.0.4)
           MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v4.0.4)
           MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                 MCA btl: self (MCA v2.1.0, API v3.1.0, Component v4.0.4)
                 MCA btl: vader (MCA v2.1.0, API v3.1.0, Component v4.0.4)
                 MCA btl: tcp (MCA v2.1.0, API v3.1.0, Component v4.0.4)
            MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v4.0.4)
            MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                 MCA crs: none (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                  MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v4.0.4)
               MCA event: libevent2022 (MCA v2.1.0, API v2.0.0, Component
                          v4.0.4)
               MCA hwloc: hwloc201 (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                  MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
                          v4.0.4)
                  MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
                          v4.0.4)
         MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v4.0.4)
         MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v4.0.4)
              MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v4.0.4)
               MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component v4.0.4)
             MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
                          v4.0.4)
                MCA pmix: flux (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                MCA pmix: isolated (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                MCA pmix: pmix3x (MCA v2.1.0, API v2.0.0, Component v4.0.4)
               MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v4.0.4)
              MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v4.0.4)
           MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v4.0.4)
               MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v4.0.4)
               MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v4.0.4)
               MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v4.0.4)
               MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v4.0.4)
              MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component
                          v4.0.4)
              MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component
                          v4.0.4)
              MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component
                          v4.0.4)
              MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component
                          v4.0.4)
                 MCA ess: env (MCA v2.1.0, API v3.0.0, Component v4.0.4)
                 MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component
                          v4.0.4)
                 MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v4.0.4)
                 MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v4.0.4)
                 MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v4.0.4)
                 MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v4.0.4)
               MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v4.0.4)
             MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component v4.0.4)
                 MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                 MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                 MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                MCA odls: pspawn (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                MCA odls: default (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                 MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                 MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                 MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                 MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                 MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component
                          v4.0.4)
                 MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                MCA regx: naive (MCA v2.1.0, API v1.0.0, Component v4.0.4)
                MCA regx: reverse (MCA v2.1.0, API v1.0.0, Component v4.0.4)
                MCA regx: fwd (MCA v2.1.0, API v1.0.0, Component v4.0.4)
               MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component
                          v4.0.4)
               MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component v4.0.4)
               MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component
                          v4.0.4)
               MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v4.0.4)
               MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component
                          v4.0.4)
               MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                 MCA rml: oob (MCA v2.1.0, API v3.0.0, Component v4.0.4)
              MCA routed: radix (MCA v2.1.0, API v3.0.0, Component v4.0.4)
              MCA routed: direct (MCA v2.1.0, API v3.0.0, Component v4.0.4)
              MCA routed: binomial (MCA v2.1.0, API v3.0.0, Component v4.0.4)
                 MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v4.0.4)
              MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component v4.0.4)
              MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v4.0.4)
              MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component v4.0.4)
              MCA schizo: singularity (MCA v2.1.0, API v1.0.0, Component
                          v4.0.4)
              MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v4.0.4)
               MCA state: tool (MCA v2.1.0, API v1.0.0, Component v4.0.4)
               MCA state: orted (MCA v2.1.0, API v1.0.0, Component v4.0.4)
               MCA state: novm (MCA v2.1.0, API v1.0.0, Component v4.0.4)
               MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v4.0.4)
               MCA state: app (MCA v2.1.0, API v1.0.0, Component v4.0.4)
                 MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                MCA coll: monitoring (MCA v2.1.0, API v2.0.0, Component
                          v4.0.4)
                MCA coll: self (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v4.0.4)
               MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
                          v4.0.4)
               MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v4.0.4)
               MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
                          v4.0.4)
               MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v4.0.4)
               MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component
                          v4.0.4)
                  MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                  MCA io: romio321 (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                  MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                 MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component
                          v4.0.4)
                 MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v4.0.4)
                 MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v4.0.4)
                 MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v4.0.4)
                 MCA pml: v (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                 MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                 MCA pml: monitoring (MCA v2.1.0, API v2.0.0, Component
                          v4.0.4)
                 MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v4.0.4)
                 MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v4.0.4)
            MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v4.0.4)
            MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
                          v4.0.4)
            MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
                          v4.0.4)
                MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v4.0.4)
                MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
                          v4.0.4)
           MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
                          v4.0.4)
  4. C Compiler: Intel ICC
    $ icc --version
    icc (ICC) 19.1.1.217 20200306
    Copyright (C) 1985-2020 Intel Corporation.  All rights reserved.
  5. GEMM library: Intel OneAPI MKL
    $ spack find -v intel-oneapi-mkl@2023.2.0 
    -- linux-centos7-cascadelake / gcc@12.2.0 -----------------------
    intel-oneapi-mkl@2023.2.0+cluster+envmods~ilp64+shared build_system=generic mpi_family=openmpi threads=openmp
    ==> 1 installed package
huanghua1994 commented 10 months ago

Thank you for trying CA3DMM and reporting the error. Based on your description, I think you could try other MPI libraries first. I have little experience using OpenMPI. My impression is that OpenMPI usually needs some arguments for running on the IB network, OmniPath network, or other high-speed network. You may also check that.