Open pierre-24 opened 1 month ago
After #3,
aocl+ilp64
gridinit
[fv-az888-613:03218] *** Process received signal *** [fv-az888-613:03218] Signal: Segmentation fault (11) [fv-az888-613:03218] Signal code: Address not mapped (1) [fv-az888-613:03218] Failing at address: (nil) [fv-az888-613:03218] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f601d642520] [fv-az888-613:03218] [ 1] /opt/AMD/aocl/aocl-linux-gcc-4.2.0/gcc/lib/libscalapack.so(blacs_gridinit_+0xf1)[0x7f601ce3e313] [fv-az888-613:03218] [ 2] libscalapacke.so(SCALAPACKE_blacs_gridinit+0x34)[0x7f601da52a26] [fv-az888-613:03218] [ 3] tests/test_pdgemm(+0x146e)[0x5566716c446e] [fv-az888-613:03218] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f601d629d90] [fv-az888-613:03218] [ 5] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f601d629e40] [fv-az888-613:03218] [ 6] tests/test_pdgemm(+0x1285)[0x5566716c4285] [fv-az888-613:03218] *** End of error message ***
mkl
openmpi
pdgemm_
[fv-az1121-872:04622] *** Process received signal *** [fv-az1121-872:04622] Signal: Segmentation fault (11) [fv-az1121-872:04622] Signal code: Address not mapped (1) [fv-az1121-872:04622] Failing at address: 0x23f0dde8 [fv-az1121-872:04621] *** Process received signal *** [fv-az1121-872:04621] Signal: Segmentation fault (11) [fv-az1121-872:04621] Signal code: Address not mapped (1) [fv-az1121-872:04621] Failing at address: 0x3589abb8 [fv-az1121-872:04622] [ 0] [fv-az1121-872:04621] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f551bc42520] [fv-az1121-872:04621] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f6c5e442520] [fv-az1121-872:04622] [ 1] /lib/x86_64-linux-gnu/libmpi.so.40(MPI_Comm_size+0x3b)[0x7f551bf3686b] [fv-az1121-872:04621] [ 2] /lib/x86_64-linux-gnu/libmpi.so.40(MPI_Comm_size+0x3b)[0x7f6c5e73686b] [fv-az1121-872:04622] [ 2] /opt/intel/oneapi/mkl/2024.2/lib/libmkl_blacs_openmpi_lp64.so.2(MKLMPI_Comm_size+0x28)[0x7f551e081308] [fv-az1121-872:04621] [ 3] /opt/intel/oneapi/mkl/2024.2/lib/libmkl_blacs_openmpi_lp64.so.2(MKLMPI_Comm_size+0x28)[0x7f6c608f0308] [fv-az1121-872:04622] [ 3] /opt/intel/oneapi/mkl/2024.2/lib/libmkl_scalapack_lp64.so.2(PB_CpgemmMPI+0x12f)[0x7f6c5e133aff] [fv-az1121-872:04622] [ 4] /opt/intel/oneapi/mkl/2024.2/lib/libmkl_scalapack_lp64.so.2(PB_CpgemmMPI+0x12f)[0x7f551b933aff] [fv-az1121-872:04621] [ 4] /opt/intel/oneapi/mkl/2024.2/lib/libmkl_scalapack_lp64.so.2(pdgemm_+0xf7f)[0x7f551b987f0f] [fv-az1121-872:04621] [ 5] /opt/intel/oneapi/mkl/2024.2/lib/libmkl_scalapack_lp64.so.2(pdgemm_+0xf7f)[0x7f6c5e187f0f] [fv-az1121-872:04622] [ 5] /home/runner/work/scalapacke/scalapacke/_build/tests/../libscalapacke.so(SCALAPACKE_pdgemm+0x85)[0x7f551c0a4097] [fv-az1121-872:04621] [ 6] tests/test_pdgemm(+0x18bc)[0x55e233cbc8bc] [fv-az1121-872:04621] [ 7] /home/runner/work/scalapacke/scalapacke/_build/tests/../libscalapacke.so(SCALAPACKE_pdgemm+0x85)[0x7f6c5e8a4097] [fv-az1121-872:04622] [ 6] tests/test_pdgemm(+0x18bc)[0x557e239fc8bc] [fv-az1121-872:04622] [ 7] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f551bc29d90] [fv-az1121-872:04621] [ 8] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f6c5e429d90] [fv-az1121-872:04622] [ 8] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f551bc29e40] [fv-az1121-872:04621] [ 9] tests/test_pdgemm(+0x1285)[0x55e233cbc285] [fv-az1121-872:04621] *** End of error message *** /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f6c5e429e40] [fv-az1121-872:04622] [ 9] tests/test_pdgemm(+0x1285)[0x557e239fc285] [fv-az1121-872:04622] *** End of error message ***
After a test on a local supercomputer, I cannot reproduce the second one (?!?). Also, AOCL install is a mess on all clusters that I know.
After #3,
aocl+ilp64
test sometimes fails ongridinit
, with an address not mappedmkl
+openmpi
(but only this one?!?) fails on apdgemm_
and a error at the MPI level with an uninitialized communicator or so.