pmodels / mpich

Official MPICH Repository
http://www.mpich.org
Other
547 stars 281 forks source link

Spack MPICH build error when +vci #6896

Closed JiakunYan closed 8 months ago

JiakunYan commented 8 months ago

Spack spec: mpich+vci netmod=ofi Platform: polaris Spack version: 0.21.0 (65d3221a9c436e77af9e802e543c71bd702ff4e2)

I am getting a bunch of errors in the build phase such as

     383685    ./src/mpid/ch4/src/ch4_types.h:260:34: error: 'MPIDI_CH4_Global_t' has no member named 'per_vci'
     383686      260 | #define MPIDI_VCI(i) MPIDI_global.per_vci[i]
     383687          |                                  ^
     383688    ./src/mpid/ch4/src/ch4_wait.h:77:66: note: in expansion of macro 'MPIDI_VCI'
     383689       77 |         state->progress_counts[i] = MPL_atomic_relaxed_load_int(&MPIDI_VCI(vci).progress_count);
     383690          |                                                                  ^~~~~~~~~
     383691    ./src/mpid/ch4/include/mpidpost.h: In function 'MPID_Request_free_hook':
  >> 383692    ./src/mpid/ch4/src/ch4_types.h:260:34: error: 'MPIDI_CH4_Global_t' has no member named 'per_vci'
     383693      260 | #define MPIDI_VCI(i) MPIDI_global.per_vci[i]
     383694          |                                  ^
     383695    ./src/mpid/ch4/include/mpidpost.h:31:46: note: in expansion of macro 'MPIDI_VCI'
     383696       31 |     int count = MPL_atomic_relaxed_load_int(&MPIDI_VCI(vci).progress_count);
     383697          |                                              ^~~~~~~~~
  >> 383698    ./src/mpid/ch4/src/ch4_types.h:260:34: error: 'MPIDI_CH4_Global_t' has no member named 'per_vci'
     383699      260 | #define MPIDI_VCI(i) MPIDI_global.per_vci[i]
     383700          |                                  ^
     383701    ./src/mpid/ch4/include/mpidpost.h:32:35: note: in expansion of macro 'MPIDI_VCI'
     383702       32 |     MPL_atomic_relaxed_store_int(&MPIDI_VCI(vci).progress_count, count + 1);
     383703          |                                   ^~~~~~~~~

Spack told me the configuration options it used were

Configuring MPICH version 4.1.2 with  '--prefix=/home/jiakuny/workspace/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/mpich-4.1.2-psjjnrs7mjetz4jx4mzqwgr4wnra3ojl' '--disable-maintainer-mode' '--disable-silent-rules' '--enable-shared' '--with-pm=hydra' '--enable-romio' '--without-ibverbs' '--enable-wrapper-rpath=yes' '--with-yaksa=/home/jiakuny/workspace/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/yaksa-0.3-x3ebjrmbc6dnaiqmqrqdt5lpb7mtdfs3' '--with-hwloc=/home/jiakuny/workspace/spack/opt/spack/linux-sles15-zen3/gcc-11.2.0/hwloc-2.9.1-rkwf3yzrljw6azip5dzm4jewv3v3vf2q' '--with-slurm=no' '--with-pmi=simple' '--without-cuda' '--without-hip' '--with-device=ch4:ofi' '--with-libfabric=/opt/cray/libfabric/1.15.2.0' '--enable-libxml2' '--enable-thread-cs=per-vci' '--with-ch4-max-vcis=default' '--with-datatype-engine=auto' 'CC=/home/jiakuny/workspace/spack/lib/spack/env/gcc/gcc' 'CXX=/home/jiakuny/workspace/spack/lib/spack/env/gcc/g++' 'FC=/home/jiakuny/workspace/spack/lib/spack/env/gcc/gfortran' 'F77=/home/jiakuny/workspace/spack/lib/spack/env/gcc/gfortran'

Full spack concretization

Input spec
--------------------------------
 -   mpich+vci netmod=ofi

Concretized
--------------------------------
 -   mpich@4.1.2%gcc@11.2.0~argobots~cuda+fortran+hwloc+hydra+libxml2+pci~rocm+romio~slurm+vci~verbs+wrapperrpath build_system=autotools datatype-engine=auto device=ch4 netmod=ofi pmi=pmi arch=linux-sles15-zen3
[e]      ^findutils@4.8.0%gcc@11.2.0 build_system=autotools patches=440b954 arch=linux-sles15-zen3
[e]      ^gmake@4.2.1%gcc@11.2.0~guile build_system=generic patches=ca60bd9,fe5b60d arch=linux-sles15-zen3
[+]      ^hwloc@2.9.1%gcc@11.2.0~cairo~cuda~gl~libudev+libxml2~netloc~nvml~oneapi-level-zero~opencl+pci~rocm build_system=autotools libs=shared,static arch=linux-sles15-zen3
[+]          ^ncurses@6.4%gcc@11.2.0~symlinks+termlib abi=none build_system=autotools arch=linux-sles15-zen3
[e]      ^libfabric@1.15.2%gcc@11.2.0~debug~kdreg build_system=autotools fabrics=cxi,sockets,tcp,udp arch=linux-sles15-zen3
[+]      ^libpciaccess@0.17%gcc@11.2.0 build_system=autotools arch=linux-sles15-zen3
[e]          ^libtool@2.4.6%gcc@11.2.0 build_system=autotools arch=linux-sles15-zen3
[+]          ^util-macros@1.19.3%gcc@11.2.0 build_system=autotools arch=linux-sles15-zen3
[+]      ^libxml2@2.10.3%gcc@11.2.0+pic~python+shared build_system=autotools arch=linux-sles15-zen3
[+]          ^libiconv@1.17%gcc@11.2.0 build_system=autotools libs=shared,static arch=linux-sles15-zen3
[+]          ^xz@5.4.1%gcc@11.2.0~pic build_system=autotools libs=shared,static arch=linux-sles15-zen3
[+]          ^zlib-ng@2.1.4%gcc@11.2.0+compat+opt build_system=autotools arch=linux-sles15-zen3
[+]      ^pkgconf@1.9.5%gcc@11.2.0 build_system=autotools arch=linux-sles15-zen3
[+]      ^yaksa@0.3%gcc@11.2.0~cuda~rocm build_system=autotools arch=linux-sles15-zen3
[e]          ^autoconf@2.69%gcc@11.2.0 build_system=autotools patches=7793209 arch=linux-sles15-zen3
[e]          ^automake@1.15.1%gcc@11.2.0 build_system=autotools arch=linux-sles15-zen3
[e]          ^m4@1.4.18%gcc@11.2.0+sigsegv build_system=autotools patches=3877ab5,fc9b616 arch=linux-sles15-zen3
[e]          ^python@3.9.13%gcc@11.2.0+bz2+crypt+ctypes+dbm~debug+libxml2+lzma+nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl+tix+tkinter+uuid+zlib build_system=generic patches=0d98e93,ebdca64,f2fd060 arch=linux-sles15-zen3

Any idea why this could happen?

hzhou commented 8 months ago

Is there any earlier error prevented parsing ch4_types.h correctly?

JiakunYan commented 8 months ago

I found a more relevant error:

In file included from ./src/include/mpiimpl.h:40,
                 from src/glue/romio/glue_romio.c:10:
./src/include/mpichconf.h:683:28: error: expected expression before 'default'
  683 | #define MPIDI_CH4_MAX_VCIS default
      |                            ^~~~~~~
./src/mpid/ch4/include/../netmod/ofi/ofi_pre.h:291:40: note: in expansion of macro 'MPIDI_CH4_MAX_VCIS'
  291 |     fi_addr_t dest[MPIDI_OFI_MAX_NICS][MPIDI_CH4_MAX_VCIS];     /* [nic][vni] */
      |                                        ^~~~~~~~~~~~~~~~~~

In mpich spack package.py, we have

if "+vci" in spec:
    config_args.append("--enable-thread-cs=per-vci")
    config_args.append("--with-ch4-max-vcis=default")

but the configure does not recognize this default input value.

hzhou commented 8 months ago

Yes, should be --with-ch4-max-vcis=64 or simply omit the option.

hzhou commented 8 months ago

tag @raffenet

raffenet commented 8 months ago

https://github.com/spack/spack/pull/42570 should fix the issue.

JiakunYan commented 8 months ago

@raffenet It works! Thanks!