Closed nkoukpaizan closed 11 months ago
You can implement this with integers and MPI_SUM. Not as elegant, but it should work everywhere.
I implemented a fix using integers and created an MR for this.
@bjpalmer Thanks for the fix. Your PR has been merged and I filed a bug report for the underlying cray-mpich issue.
Issue type
Relates to
Summary
I was trying to build the current develop branch on Frontier, and saw several functionality tests failing (e.g.,
FUNCTIONALITY_TEST_PFLOW_TESTSUITE_1_proc
). A backtrace shows:Program received signal SIGABRT, Aborted. 0x00007fffe8371cbb in raise () from /lib64/libc.so.6 (gdb) backtrace
0 0x00007fffe8371cbb in raise () from /lib64/libc.so.6
1 0x00007fffe8373355 in abort () from /lib64/libc.so.6
2 0x00007fffe8d2b5b9 in __gnu_cxx::__verbose_terminate_handler () at ../../../../cpe-gcc-12.2.0-202211182106.97b1815c41a72/libstdc++-v3/libsupc++/vterminate.cc:95
3 0x00007fffe8d36bea in cxxabiv1::terminate (handler=) at ../../../../cpe-gcc-12.2.0-202211182106.97b1815c41a72/libstdc++-v3/libsupc++/eh_terminate.cc:48
4 0x00007fffe8d36c55 in std::terminate () at ../../../../cpe-gcc-12.2.0-202211182106.97b1815c41a72/libstdc++-v3/libsupc++/eh_terminate.cc:58
5 0x00007fffe8d36ea7 in cxxabiv1::cxa_throw (obj=, tinfo=0x216300 , dest=0x2999d0 <ExaGOError::~ExaGOError()>)
6 0x0000000000299054 in is_true_somewhere (flag=false, comm=1140850688) at /lustre/orion/scratch/nkouk/csc359/ExaGO/tests/functionality/pflow/../toml_utils.h:75
7 0x000000000029a993 in PflowFunctionalityTests::ensure_options_are_consistent (this=0x7fffffff5cc0, testcase=..., presets=...)
8 0x0000000000299c6c in FunctionalityTestContext::run_all_test_cases (this=0x7fffffff5cc0)
9 0x0000000000299371 in main (argc=2, argv=0x7fffffff61e8) at /lustre/orion/scratch/nkouk/csc359/ExaGO/tests/functionality/pflow/selfcheck.cpp:196
Fatal error in PMPI_Allreduce: Invalid MPI_Op, error stack: PMPI_Allreduce(497).....: MPI_Allreduce(sbuf=0x7fffffff59e3, rbuf=0x7fffffff59e2, count=1, datatype=dtype=0x4c000133, op=MPI_LOR, comm=MPI_COMM_WORLD) failed MPIR_LOR_check_dtype(92): MPI_Op MPI_LOR operation not defined for this datatype