Open Mittagskogel opened 2 years ago
There is a similar issue here. This may be a configuration issue regarding the path to the .dacecache.
Have you tried running something simple, e.g., samples/simple/axpy.py
?
Also, I suggest you don't use the combination MKL+OpenMPI, but if you want to try, you will need to set the PBLAS default library implementation accordingly.
axpy.py
seems to work fine:
(dace_env) [user@cluster dace]$ python ./samples/simple/axpy.py
Difference: 0.0
I've now switched to MKL+MPICH, but I don't think this is relevant for the issue. Also, adding the default_build_folder
configuration option doesn't change anything:
(dace_env) [user@cluster dace]$ export DACE_default_build_folder=.dacecache
(dace_env) [user@cluster dace]$ python ./samples/distributed/polybench.py
===== atax =====
sizes: [20000, 25000]
adjusted sizes: (20000, 25000)
data initialized
-- The C compiler identification is GNU 11.3.0
-- The CXX compiler identification is GNU 11.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /cm/shared/apps/spack-stack/linux-rocky8-zen2/gcc-11.3.0/gcc-11.3.0-rkggaw2lju22imfhv77nqtu6uhcgyizv/bin/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /cm/shared/apps/spack-stack/linux-rocky8-zen2/gcc-11.3.0/gcc-11.3.0-rkggaw2lju22imfhv77nqtu6uhcgyizv/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found MPI_C: /home/user/spack/opt/spack/linux-rocky8-zen2/gcc-11.3.0/mpich-4.0.2-j3plqofcp37hfnmsnd3brbszqmhgjppu/lib/libmpi.so (found version "4.0")
-- Found MPI_CXX: /home/user/spack/opt/spack/linux-rocky8-zen2/gcc-11.3.0/mpich-4.0.2-j3plqofcp37hfnmsnd3brbszqmhgjppu/lib/libmpicxx.so (found version "4.0")
-- Found MPI: TRUE (found version "4.0")
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/tmpsobb3aj9/build
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/tmpsobb3aj9/build
CMake Warning:
No source or binary directory provided. Both will be assumed to be the
same as the current working directory, but note that this warning will
become a fatal error in future CMake releases.
CMake Error: Generator implementation error, all generators must specify this->FindMakeProgramFile
-Wl,-rpath -Wl,/home/user/spack/opt/spack/linux-rocky8-zen2/gcc-11.3.0/mpich-4.0.2-j3plqofcp37hfnmsnd3brbszqmhgjppu/lib -L /home/user/spack/opt/spack/linux-rocky8-zen2/gcc-11.3.0/mpich-4.0.2-j3plqofcp37hfnmsnd3brbszqmhgjppu/lib -lmpicxx -L /home/user/spack/opt/spack/linux-rocky8-zen2/gcc-11.3.0/mpich-4.0.2-j3plqofcp37hfnmsnd3brbszqmhgjppu/lib -lmpi
CMake Error: The source directory "/tmp/tmpsobb3aj9/build" does not appear to contain CMakeLists.txt.
Specify --help for usage, or press the help button on the CMake GUI.
Abort(605670927): Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(59)....: MPI_Init(argc=0x7fffffff8cd0, argv=0x7fffffff8cc8) failed
MPII_Init_thread(209):
MPID_Init(359).......:
MPIR_pmi_init(141)...: PMI2_Job_GetId returned 14
I believe that this is because CMake isn't failing during an actual build step, it's failing while trying to detect the linker flags, so perhaps this configuration option doesn't apply.
That makes sense, then. Can you check that you are using a mpi4py build that is compatible with MPICH? If you are using the same as above (py-mpi4py-3.1.2-gcc-11.3.0-openmpi-u3yz3iy
), then this is the most likely cause of the error.
The issue with MKL+OpenMPI is that it works only with Intel's static BLACS library (see here), which may not even be installed in your system.
Indeed, I've finally gotten the problem to run with MKL+MPICH. Like you suggested, the py-mpi4py
module was still incorrect.
Thus, the CMake error message
CMake Error: Generator implementation error, all generators must specify this->FindMakeProgramFile
-Wl,-rpath -Wl,/home/user/spack/opt/spack/linux-rocky8-zen2/gcc-11.3.0/mpich-4.0.2-j3plqofcp37hfnmsnd3brbszqmhgjppu/lib -L /home/user/spack/opt/spack/linux-rocky8-zen2/gcc-11.3.0/mpich-4.0.2-j3plqofcp37hfnmsnd3brbszqmhgjppu/lib -lmpicxx -L /home/user/spack/opt/spack/linux-rocky8-zen2/gcc-11.3.0/mpich-4.0.2-j3plqofcp37hfnmsnd3brbszqmhgjppu/lib -lmpi
CMake Error: The source directory "/tmp/tmpv_4lp_8g/build" does not exist.
Specify --help for usage, or press the help button on the CMake GUI.
seems to be a huge red herring; we've been trying to fix the wrong problem all along. How can we force CMake to give more verbose output about linking errors in this step?
I understand what is going on now. We have made a separate CMake script that is indeed making a pseudo-project in a tmp
directory to get the correct path and names to the MPICH libraries, e.g., on Cray machines. It seems that we haven't made a complete script, which is probably the reason for these errors. Does the program run now (regardless of whether the error still appears)? Could you tell me your CMake version, so I can try to reproduce and fix it?
Yes, the distributed benchmark is running regardless of the CMake errors. I'm using CMake 3.24.2, but I've tested multiple versions and they all triggered the error.
We seem to have the same problem, although I confirm that I have set the correct default build folder. In addition, we observe that .dacecache folder is created correctly in the current directory and contains the atax subdirectory. I have tried to replicate on other machines. On that machine, when I use MPICH, although cmakelists are still written to the tmp directory, it can correctly run a complete polybench.
We will fix this issue. In the meantime, if you are from the Student Cluster Competition, I made a post that describes the issue in more detail, in case you want/need to make any amendments to the relevant files.
@alexnick83 Was this fixed?
Running
samples/distributed/polybench.py
causes CMake to crash. The problem seems to occur incompilers.py
when calling theenv.cmake_compile_flags
function in line 297.Steps to reproduce the problem:
Output for
python ./samples/distributed/polybench.py
: