nv-legate / cupynumeric

An Aspiring Drop-In Replacement for NumPy at Scale
https://docs.nvidia.com/cupynumeric
Apache License 2.0
623 stars 71 forks source link

[BUG] Hang with 5D arrays #1128

Open CharlelieLrt opened 8 months ago

CharlelieLrt commented 8 months ago

Software versions

Python : 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 16:04:32) [GCC 12.3.0] Platform : Linux-4.14.0-115.35.1.3chaos.ch6a.ppc64le-ppc64le-with-glibc2.17 Legion : legion-23.09.0-4871-g04ee5be1d Legate : 23.11.00.dev+57.gde1ad0f Cunumeric : 23.11.00.dev+33.g8693a3d6 Numpy : 1.26.3 Scipy : 1.12.0 Numba : 0.58.1 CTK package : (failed to detect) GPU driver : 510.47.03 GPU devices :
GPU 0: Tesla V100-SXM2-16GB GPU 1: Tesla V100-SXM2-16GB GPU 2: Tesla V100-SXM2-16GB GPU 3: Tesla V100-SXM2-16GB

Expected behavior

I have an application operating on 5D arrays of shape (M, N, K, K, K), where N is fixed. The application works on 1 node (4 GPUs). I attempt two types of scaling:

  1. Scaling w.r.t. K: I increase K such that the volume of the last 3 dimension becomes 2*K**3, and run on two nodes. The code executes as expected.
  2. Scaling the first dimension M: I increase M to 2 * M. The code does not execute.

Observed behavior

Point 2. above does not result in any error, but the code seems to indefinitely hang, even before starting any computation.

Example code or instructions

The node is executed on a PowerPC 9 system with 4 V100 GPUs per node. It is launched with:

jsrun -n 2 -r 1 -a 1 -c ALL_CPUS -g ALL_GPUS -b none /g/g92/laurent3/miniforge3/envs/legate_012024/bin/bind.sh --launcher jsrun -- /g/g92/laurent3/    miniforge3/envs/legate_012024/bin/legion_python -ll:py 1 -ll:gpu 4 -cuda:skipbusy -ll:ocpu 4 -ll:othr 4 -ll:onuma 0 -ll:util 2 -ll:bgwork 2 -ll:csize 200000 -ll:fsize 14500 -ll:zsize 512 -ll:rsize 512 -level openmp=5,gpu=5 -logfile 2024/02/22/133402/legate_%.log -errlevel 4 -lg:eager_alloc_percentage 5 hit.py

Stack traceback or browser console output

None.

lightsighter commented 8 months ago

Can you add the following flag to your command line -ll:force_kthreads -lg:inorder -lg:safe_ctrlrepl 1 and then attach a debugger to the process on each node and report the results of thread apply all bt from each node?

lightsighter commented 8 months ago

If possible build the Legate core with --debug before doing that so we can get line numbers for the backtraces.

CharlelieLrt commented 8 months ago

Trying to build legate core with --debug gives me the error below. It builds without problem without the debug option.

  FAILED: _deps/legion-build/lib/liblegion.so.1
  : && /usr/tce/packages/gcc/gcc-8.3.1/bin/c++ -fPIC -mcpu=native -maltivec -mabi=altivec -mvsx -O0 -g   -shared -Wl,-soname,liblegion.so.1 -o _deps/legion-build/lib/liblegion.so.1 _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/mappers/default_mapper.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/mappers/mapping_utilities.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/mappers/shim_mapper.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/mappers/test_mapper.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/mappers/null_mapper.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/mappers/replay_mapper.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/mappers/debug_mapper.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/mappers/wrapper_mapper.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/mappers/forwarding_mapper.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/mappers/logging_wrapper.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/garbage_collection.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/index_space_value.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/legion_analysis.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/legion_c.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/legion_constraint.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/legion_context.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/legion.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/legion_instances.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/legion_mapping.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/legion_ops.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/legion_profiling.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/legion_profiling_serializer.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/legion_replication.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/legion_spy.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/legion_tasks.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/legion_trace.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/legion_views.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/legion_redop.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/mapper_manager.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/runtime.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/legion_redop.cu.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_1.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_1_1.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_1_2.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_1_3.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_1_4.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_1_5.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_2.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_2_1.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_2_2.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_2_3.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_2_4.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_2_5.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_3.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_3_1.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_3_2.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_3_3.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_3_4.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_3_5.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_4.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_4_1.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_4_2.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_4_3.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_4_4.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_4_5.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_1.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_2.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_3.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_4.cc.o _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o -L/usr/tce/packages/cuda/cuda-12.0.0/nvidia/lib64   -L/usr/tce/packages/cuda/cuda-12.0.0/nvidia/targets/ppc64le-linux/lib/stubs   -L/usr/tce/packages/cuda/cuda-12.0.0/nvidia/targets/ppc64le-linux/lib -Wl,-rpath,"\$ORIGIN:/g/g92/laurent3/miniforge3/envs/legate_01302024_DEBUG/lib:/usr/tce/packages/cuda/cuda-12.0.0/nvidia/lib64:/usr/WS1/laurent3/Codes/LEGATE/legate_01302024_DEBUG.core/_skbuild/linux-ppc64le-3.10/cmake-build/_deps/legion-build/lib:/usr/tce/packages/cuda/cuda-12.0.0/lib64:/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/lib:/usr/tce/packages/cuda/cuda-12.0.0/nvidia/targets/ppc64le-linux/lib:"  _deps/legion-build/lib/librealm.so.1  /g/g92/laurent3/miniforge3/envs/legate_01302024_DEBUG/lib/libz.so  _deps/legion-build/embed-gasnet/install/lib/libgasnet-ibv-par.a  _deps/legion-build/embed-gasnet/install/lib/libgasnet-ibv-par.a  /usr/lib64/libibverbs.so  /usr/lib64/libhwloc.so  /usr/tce/packages/cuda/cuda-12.0.0/lib64/libcuda.so  -lpthread  /usr/lib64/librt.so  /usr/tce/packages/gcc/gcc-8.3.1/rh/usr/lib/gcc/ppc64le-redhat-linux/8/libgcc.a  /usr/lib64/libm.so  /usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/lib/libmpiprofilesupport.so  /usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/lib/libmpi_ibm.so  /g/g92/laurent3/miniforge3/envs/legate_01302024_DEBUG/lib/libcudart.so  /usr/tce/packages/cuda/cuda-12.0.0/nvidia/targets/ppc64le-linux/lib/libcuda.so  -lcudadevrt  -lcudart && :
  _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o: In function `Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Rect<5, long long> >* std::__uninitialized_move_if_noexcept_a<Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Rect<5, long long> >*, Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Rect<5, long long> >*, std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Rect<5, long long> > > >(Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Rect<5, long long> >*, Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Rect<5, long long> >*, Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Rect<5, long long> >*, std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Rect<5, long long> > >&)':
  /usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/bits/stl_uninitialized.h:311:(.text._ZSt34__uninitialized_move_if_noexcept_aIPN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5EjEENS0_4RectILi5ExEEEES7_SaIS6_EET0_T_SA_S9_RT1_[_ZSt34__uninitialized_move_if_noexcept_aIPN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5EjEENS0_4RectILi5ExEEEES7_SaIS6_EET0_T_SA_S9_RT1_]+0x34): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `std::move_iterator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Rect<5, long long> >*> std::__make_move_if_noexcept_iterator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Rect<5, long long> >, std::move_iterator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Rect<5, long long> >*> >(Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Rect<5, long long> >*)' defined in .text._ZSt32__make_move_if_noexcept_iteratorIN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5EjEENS0_4RectILi5ExEEEESt13move_iteratorIPS6_EET0_PT_[_ZSt32__make_move_if_noexcept_iteratorIN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5EjEENS0_4RectILi5ExEEEESt13move_iteratorIPS6_EET0_PT_] section in _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o
  /usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/bits/stl_uninitialized.h:311:(.text._ZSt34__uninitialized_move_if_noexcept_aIPN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5EjEENS0_4RectILi5ExEEEES7_SaIS6_EET0_T_SA_S9_RT1_[_ZSt34__uninitialized_move_if_noexcept_aIPN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5EjEENS0_4RectILi5ExEEEES7_SaIS6_EET0_T_SA_S9_RT1_]+0x44): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `std::move_iterator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Rect<5, long long> >*> std::__make_move_if_noexcept_iterator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Rect<5, long long> >, std::move_iterator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Rect<5, long long> >*> >(Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Rect<5, long long> >*)' defined in .text._ZSt32__make_move_if_noexcept_iteratorIN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5EjEENS0_4RectILi5ExEEEESt13move_iteratorIPS6_EET0_PT_[_ZSt32__make_move_if_noexcept_iteratorIN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5EjEENS0_4RectILi5ExEEEESt13move_iteratorIPS6_EET0_PT_] section in _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o
  _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o: In function `std::vector<Realm::FieldDataDescriptor<Realm::IndexSpace<5, long long>, Realm::Point<5, unsigned int> >, std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, long long>, Realm::Point<5, unsigned int> > > >::vector(std::vector<Realm::FieldDataDescriptor<Realm::IndexSpace<5, long long>, Realm::Point<5, unsigned int> >, std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, long long>, Realm::Point<5, unsigned int> > > > const&)':
  /usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/bits/stl_vector.h:460:(.text._ZNSt6vectorIN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5ExEENS0_5PointILi5EjEEEESaIS6_EEC2ERKS8_[_ZNSt6vectorIN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5ExEENS0_5PointILi5EjEEEESaIS6_EEC5ERKS8_]+0x38): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `std::vector<Realm::FieldDataDescriptor<Realm::IndexSpace<5, long long>, Realm::Point<5, unsigned int> >, std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, long long>, Realm::Point<5, unsigned int> > > >::size() const' defined in .text._ZNKSt6vectorIN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5ExEENS0_5PointILi5EjEEEESaIS6_EE4sizeEv[_ZNKSt6vectorIN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5ExEENS0_5PointILi5EjEEEESaIS6_EE4sizeEv] section in _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o
  _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o: In function `std::vector<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, unsigned int> >, std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, unsigned int> > > >::begin() const':
  /usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/bits/stl_vector.h:708:(.text._ZNKSt6vectorIN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5EiEENS0_4RectILi5EjEEEESaIS6_EE5beginEv[_ZNKSt6vectorIN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5EiEENS0_4RectILi5EjEEEESaIS6_EE5beginEv]+0x3c): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `__gnu_cxx::__normal_iterator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, unsigned int> > const*, std::vector<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, unsigned int> >, std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, unsigned int> > > > >::__normal_iterator(Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, unsigned int> > const* const&)' defined in .text._ZN9__gnu_cxx17__normal_iteratorIPKN5Realm19FieldDataDescriptorINS1_10IndexSpaceILi5EiEENS1_4RectILi5EjEEEESt6vectorIS7_SaIS7_EEEC2ERKS9_[_ZN9__gnu_cxx17__normal_iteratorIPKN5Realm19FieldDataDescriptorINS1_10IndexSpaceILi5EiEENS1_4RectILi5EjEEEESt6vectorIS7_SaIS7_EEEC5ERKS9_] section in _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o
  _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o: In function `std::vector<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, unsigned int> >, std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, unsigned int> > > >::end() const':
  /usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/bits/stl_vector.h:726:(.text._ZNKSt6vectorIN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5EiEENS0_4RectILi5EjEEEESaIS6_EE3endEv[_ZNKSt6vectorIN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5EiEENS0_4RectILi5EjEEEESaIS6_EE3endEv]+0x3c): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `__gnu_cxx::__normal_iterator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, unsigned int> > const*, std::vector<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, unsigned int> >, std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, unsigned int> > > > >::__normal_iterator(Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, unsigned int> > const* const&)' defined in .text._ZN9__gnu_cxx17__normal_iteratorIPKN5Realm19FieldDataDescriptorINS1_10IndexSpaceILi5EiEENS1_4RectILi5EjEEEESt6vectorIS7_SaIS7_EEEC2ERKS9_[_ZN9__gnu_cxx17__normal_iteratorIPKN5Realm19FieldDataDescriptorINS1_10IndexSpaceILi5EiEENS1_4RectILi5EjEEEESt6vectorIS7_SaIS7_EEEC5ERKS9_] section in _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o
  _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o: In function `std::allocator_traits<std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, long long>, Realm::Rect<5, int> > > >::allocate(std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, long long>, Realm::Rect<5, int> > >&, unsigned long)':
  /usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/bits/alloc_traits.h:436:(.text._ZNSt16allocator_traitsISaIN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5ExEENS0_4RectILi5EiEEEEEE8allocateERS7_m[_ZNSt16allocator_traitsISaIN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5ExEENS0_4RectILi5EiEEEEEE8allocateERS7_m]+0x30): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `__gnu_cxx::new_allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, long long>, Realm::Rect<5, int> > >::allocate(unsigned long, void const*)' defined in .text._ZN9__gnu_cxx13new_allocatorIN5Realm19FieldDataDescriptorINS1_10IndexSpaceILi5ExEENS1_4RectILi5EiEEEEE8allocateEmPKv[_ZN9__gnu_cxx13new_allocatorIN5Realm19FieldDataDescriptorINS1_10IndexSpaceILi5ExEENS1_4RectILi5EiEEEEE8allocateEmPKv] section in _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o
  _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o: In function `__gnu_cxx::__alloc_traits<std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Point<5, unsigned int> > >, Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Point<5, unsigned int> > >::_S_select_on_copy(std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Point<5, unsigned int> > > const&)':
  /usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/ext/alloc_traits.h:95:(.text._ZN9__gnu_cxx14__alloc_traitsISaIN5Realm19FieldDataDescriptorINS1_10IndexSpaceILi5EjEENS1_5PointILi5EjEEEEES7_E17_S_select_on_copyERKS8_[_ZN9__gnu_cxx14__alloc_traitsISaIN5Realm19FieldDataDescriptorINS1_10IndexSpaceILi5EjEENS1_5PointILi5EjEEEEES7_E17_S_select_on_copyERKS8_]+0x30): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `std::allocator_traits<std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Point<5, unsigned int> > > >::select_on_container_copy_construction(std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, unsigned int>, Realm::Point<5, unsigned int> > > const&)' defined in .text._ZNSt16allocator_traitsISaIN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5EjEENS0_5PointILi5EjEEEEEE37select_on_container_copy_constructionERKS7_[_ZNSt16allocator_traitsISaIN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5EjEENS0_5PointILi5EjEEEEEE37select_on_container_copy_constructionERKS7_] section in _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o
  _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o: In function `Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, int> >* std::__uninitialized_copy<false>::__uninit_copy<std::move_iterator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, int> >*>, Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, int> >*>(std::move_iterator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, int> >*>, std::move_iterator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, int> >*>, Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, int> >*)':
  /usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/bits/stl_uninitialized.h:83:(.text._ZNSt20__uninitialized_copyILb0EE13__uninit_copyISt13move_iteratorIPN5Realm19FieldDataDescriptorINS3_10IndexSpaceILi5EiEENS3_4RectILi5EiEEEEESA_EET0_T_SD_SC_[_ZNSt20__uninitialized_copyILb0EE13__uninit_copyISt13move_iteratorIPN5Realm19FieldDataDescriptorINS3_10IndexSpaceILi5EiEENS3_4RectILi5EiEEEEESA_EET0_T_SD_SC_]+0x84): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `void std::_Construct<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, int> >, Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, int> > >(Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, int> >*, Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Rect<5, int> >&&)' defined in .text._ZSt10_ConstructIN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5EiEENS0_4RectILi5EiEEEEJS6_EEvPT_DpOT0_[_ZSt10_ConstructIN5Realm19FieldDataDescriptorINS0_10IndexSpaceILi5EiEENS0_4RectILi5EiEEEEJS6_EEvPT_DpOT0_] section in _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o
  _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o: In function `Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, int> >* std::__uninitialized_copy<false>::__uninit_copy<__gnu_cxx::__normal_iterator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, int> > const*, std::vector<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, int> >, std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, int> > > > >, Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, int> >*>(__gnu_cxx::__normal_iterator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, int> > const*, std::vector<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, int> >, std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, int> > > > >, __gnu_cxx::__normal_iterator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, int> > const*, std::vector<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, int> >, std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, int> > > > >, Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, int> >*)':
  /usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/bits/stl_uninitialized.h:83:(.text._ZNSt20__uninitialized_copyILb0EE13__uninit_copyIN9__gnu_cxx17__normal_iteratorIPKN5Realm19FieldDataDescriptorINS4_10IndexSpaceILi5EiEENS4_5PointILi5EiEEEESt6vectorISA_SaISA_EEEEPSA_EET0_T_SJ_SI_[_ZNSt20__uninitialized_copyILb0EE13__uninit_copyIN9__gnu_cxx17__normal_iteratorIPKN5Realm19FieldDataDescriptorINS4_10IndexSpaceILi5EiEENS4_5PointILi5EiEEEESt6vectorISA_SaISA_EEEEPSA_EET0_T_SJ_SI_]+0x70): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `__gnu_cxx::__normal_iterator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, int> > const*, std::vector<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, int> >, std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, int> > > > >::operator*() const' defined in .text._ZNK9__gnu_cxx17__normal_iteratorIPKN5Realm19FieldDataDescriptorINS1_10IndexSpaceILi5EiEENS1_5PointILi5EiEEEESt6vectorIS7_SaIS7_EEEdeEv[_ZNK9__gnu_cxx17__normal_iteratorIPKN5Realm19FieldDataDescriptorINS1_10IndexSpaceILi5EiEENS1_5PointILi5EiEEEESt6vectorIS7_SaIS7_EEEdeEv] section in _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o
  _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o: In function `bool __gnu_cxx::operator!=<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, long long> > const*, std::vector<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, long long> >, std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, long long> > > > >(__gnu_cxx::__normal_iterator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, long long> > const*, std::vector<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, long long> >, std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, long long> > > > > const&, __gnu_cxx::__normal_iterator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, long long> > const*, std::vector<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, long long> >, std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, long long> > > > > const&)':
  /usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/bits/stl_iterator.h:887:(.text._ZN9__gnu_cxxneIPKN5Realm19FieldDataDescriptorINS1_10IndexSpaceILi5EiEENS1_5PointILi5ExEEEESt6vectorIS7_SaIS7_EEEEbRKNS_17__normal_iteratorIT_T0_EESI_[_ZN9__gnu_cxxneIPKN5Realm19FieldDataDescriptorINS1_10IndexSpaceILi5EiEENS1_5PointILi5ExEEEESt6vectorIS7_SaIS7_EEEEbRKNS_17__normal_iteratorIT_T0_EESI_]+0x2c): relocation truncated to fit: R_PPC64_REL24 (stub) against symbol `__gnu_cxx::__normal_iterator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, long long> > const*, std::vector<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, long long> >, std::allocator<Realm::FieldDataDescriptor<Realm::IndexSpace<5, int>, Realm::Point<5, long long> > > > >::base() const' defined in .text._ZNK9__gnu_cxx17__normal_iteratorIPKN5Realm19FieldDataDescriptorINS1_10IndexSpaceILi5EiEENS1_5PointILi5ExEEEESt6vectorIS7_SaIS7_EEE4baseEv[_ZNK9__gnu_cxx17__normal_iteratorIPKN5Realm19FieldDataDescriptorINS1_10IndexSpaceILi5EiEENS1_5PointILi5ExEEEESt6vectorIS7_SaIS7_EEE4baseEv] section in _deps/legion-build/runtime/CMakeFiles/LegionRuntime.dir/legion/region_tree_5_5.cc.o
  /usr/tce/packages/gcc/gcc-8.3.1/rh/usr/include/c++/8/bits/stl_iterator.h:887:(.text._ZN9__gnu_cxxneIPKN5Realm19FieldDataDescriptorINS1_10IndexSpaceILi5EiEENS1_5PointILi5ExEEEESt6vectorIS7_SaIS7_EEEEbRKNS_17__normal_iteratorIT_T0_EESI_[_ZN9__gnu_cxxneIPKN5Realm19FieldDataDescriptorINS1_10IndexSpaceILi5EiEENS1_5PointILi5ExEEEESt6vectorIS7_SaIS7_EEEEbRKNS_17__normal_iteratorIT_T0_EESI_]+0x40): additional relocation overflows omitted from the output
lightsighter commented 8 months ago

That's mostly an issue with your linker trying to shoehorn something that needs more than 24-bits of address space into a tiny 24-bit address space. You can try adding this flag to your link flags -mcmodel=large or you can try doing a --debug-release build.

CharlelieLrt commented 8 months ago

That helped, thanks! I could generate the two backtraces attached. node1_bt.txt node0_bt.txt

CharlelieLrt commented 8 months ago

I've done more tests and realized that the hang has nothing to do with the number of nodes. It hangs even on a single node when M >1 (where M is the first dimension of my arrays of shape (M, N, K, K, K)), but runs when M = 1. I have generated an updated backtrace for a single node run. bt_single_node.txt

lightsighter commented 8 months ago

This backtrace doesn't look like a hang to me. It just looks like it is running really slowly. Can you provide a reproducer program and a command line for us to play with? I suspect you'll see the issue on other GPU machines that are not PowerPC.

lightsighter commented 8 months ago

Also, what is the behavior if you run only with CPUs and no GPUs?

CharlelieLrt commented 8 months ago

I have been trying to make a smaller reproducer, but commenting out different parts of the code will make it run normally/trigger the very slow execution. So I can't really isolate a part of the code that is causing this issue.

After more tests, I've also noticed that's it's probably not (completely) due to 5D arrays: if I decrease the volume of the arrays by decreasing K, I can run with M >= 2 on a single node. For example, with arrays of shape (2, N, 80, 80, 80), the code execute normally, but with arrays of shape (2, N, 96, 96, 96) I have this very slow execution. It's also not due to the total volume of the arrays, as I can run normally with arrays of shape (1, N, 256, 256, 256).

When using only CPUs the code execute normally in all cases.