pierre-24 / scalapacke

Create the missing C headers (as well as some wrappers) for scaLAPACK
https://pierre-24.github.io/scalapacke/
MIT License
1 stars 1 forks source link

Fails with `test_pdgemm` #8

Open pierre-24 opened 1 month ago

pierre-24 commented 1 month ago

After #3,

[fv-az888-613:03218] *** Process received signal ***
[fv-az888-613:03218] Signal: Segmentation fault (11)
[fv-az888-613:03218] Signal code: Address not mapped (1)
[fv-az888-613:03218] Failing at address: (nil)
[fv-az888-613:03218] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f601d642520]
[fv-az888-613:03218] [ 1] /opt/AMD/aocl/aocl-linux-gcc-4.2.0/gcc/lib/libscalapack.so(blacs_gridinit_+0xf1)[0x7f601ce3e313]
[fv-az888-613:03218] [ 2] libscalapacke.so(SCALAPACKE_blacs_gridinit+0x34)[0x7f601da52a26]
[fv-az888-613:03218] [ 3] tests/test_pdgemm(+0x146e)[0x5566716c446e]
[fv-az888-613:03218] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f601d629d90]
[fv-az888-613:03218] [ 5] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f601d629e40]
[fv-az888-613:03218] [ 6] tests/test_pdgemm(+0x1285)[0x5566716c4285]
[fv-az888-613:03218] *** End of error message ***
[fv-az1121-872:04622] *** Process received signal ***
[fv-az1121-872:04622] Signal: Segmentation fault (11)
[fv-az1121-872:04622] Signal code: Address not mapped (1)
[fv-az1121-872:04622] Failing at address: 0x23f0dde8
[fv-az1121-872:04621] *** Process received signal ***
[fv-az1121-872:04621] Signal: Segmentation fault (11)
[fv-az1121-872:04621] Signal code: Address not mapped (1)
[fv-az1121-872:04621] Failing at address: 0x3589abb8
[fv-az1121-872:04622] [ 0] [fv-az1121-872:04621] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f551bc42520]
[fv-az1121-872:04621] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f6c5e442520]
[fv-az1121-872:04622] [ 1] /lib/x86_64-linux-gnu/libmpi.so.40(MPI_Comm_size+0x3b)[0x7f551bf3686b]
[fv-az1121-872:04621] [ 2] /lib/x86_64-linux-gnu/libmpi.so.40(MPI_Comm_size+0x3b)[0x7f6c5e73686b]
[fv-az1121-872:04622] [ 2] /opt/intel/oneapi/mkl/2024.2/lib/libmkl_blacs_openmpi_lp64.so.2(MKLMPI_Comm_size+0x28)[0x7f551e081308]
[fv-az1121-872:04621] [ 3] /opt/intel/oneapi/mkl/2024.2/lib/libmkl_blacs_openmpi_lp64.so.2(MKLMPI_Comm_size+0x28)[0x7f6c608f0308]
[fv-az1121-872:04622] [ 3] /opt/intel/oneapi/mkl/2024.2/lib/libmkl_scalapack_lp64.so.2(PB_CpgemmMPI+0x12f)[0x7f6c5e133aff]
[fv-az1121-872:04622] [ 4] /opt/intel/oneapi/mkl/2024.2/lib/libmkl_scalapack_lp64.so.2(PB_CpgemmMPI+0x12f)[0x7f551b933aff]
[fv-az1121-872:04621] [ 4] /opt/intel/oneapi/mkl/2024.2/lib/libmkl_scalapack_lp64.so.2(pdgemm_+0xf7f)[0x7f551b987f0f]
[fv-az1121-872:04621] [ 5] /opt/intel/oneapi/mkl/2024.2/lib/libmkl_scalapack_lp64.so.2(pdgemm_+0xf7f)[0x7f6c5e187f0f]
[fv-az1121-872:04622] [ 5] /home/runner/work/scalapacke/scalapacke/_build/tests/../libscalapacke.so(SCALAPACKE_pdgemm+0x85)[0x7f551c0a4097]
[fv-az1121-872:04621] [ 6] tests/test_pdgemm(+0x18bc)[0x55e233cbc8bc]
[fv-az1121-872:04621] [ 7] /home/runner/work/scalapacke/scalapacke/_build/tests/../libscalapacke.so(SCALAPACKE_pdgemm+0x85)[0x7f6c5e8a4097]
[fv-az1121-872:04622] [ 6] tests/test_pdgemm(+0x18bc)[0x557e239fc8bc]
[fv-az1121-872:04622] [ 7] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f551bc29d90]
[fv-az1121-872:04621] [ 8] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f6c5e429d90]
[fv-az1121-872:04622] [ 8] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f551bc29e40]
[fv-az1121-872:04621] [ 9] tests/test_pdgemm(+0x1285)[0x55e233cbc285]
[fv-az1121-872:04621] *** End of error message ***
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f6c5e429e40]
[fv-az1121-872:04622] [ 9] tests/test_pdgemm(+0x1285)[0x557e239fc285]
[fv-az1121-872:04622] *** End of error message ***
pierre-24 commented 1 month ago

After a test on a local supercomputer, I cannot reproduce the second one (?!?). Also, AOCL install is a mess on all clusters that I know.