starpu-runtime / starpu

This is a mirror of https://gitlab.inria.fr/starpu/starpu where our development happens, but contributions are welcome here too!
https://starpu.gitlabpages.inria.fr/
GNU Lesser General Public License v2.1
58 stars 13 forks source link

How to solve illegal memory access during LU decomposition using starpu #30

Closed WwwwwYyyy closed 7 months ago

WwwwwYyyy commented 9 months ago

How to solve the problem of illegal memory access during LU decomposition using StarPU?My CUDA version is 12.2, my MPI version is 4.0, and I am using the latest version of StarPU. As shown in the figure,How can I solve this problem?Thank you! 295d5c4522d2dd376092dc59e338cc2 ecab74014fc0a23c083d5abd8b513fb

nfurmento commented 9 months ago

How did you configure StarPU ? Which program are you running and how ?

WwwwwYyyy commented 9 months ago

./configure --enable-maxcudadev=8, The program I am running now is the LU decomposition code written by my team themselves, in a distributed environment This is the result after configuring 7cf8c100943fe9e6b1306e0ac8bca5f e317cd8617f4506e400f6e09d10fac6

nfurmento commented 9 months ago

can you run 'make check' to make sure StarPU testsuite is running properly ?

WwwwwYyyy commented 9 months ago

This is my result after ‘make check’, but I can run programs properly such as plu_example_double.Can you tell me if this is normal?

============================================ StarPU 1.4.99: examples/test-suite.log

TOTAL: 150

PASS: 135

SKIP: 3

XFAIL: 0

FAIL: 12

XPASS: 0

ERROR: 0

.. contents:: :depth: 2

FAIL: scheduler/schedulers.sh

cholesky.modular-eager [starpu][node1][starpu_history_based_job_expected_perf] Warning: model chol_model_potrf is not calibrated enough for cpu0_impl0 (Comb8) size 409600 footprint cea37d6d (only 0 measurements), forcing calibration for this run. Use the STARPU_CALIBRATE environment variable to control this. You probably need to run again to continue calibrating the model, until this warning disappears. [starpu][node1][__starpu_history_based_job_expected_perf] Warning: model chol_model_trsm is not calibrated enough for cpu0_impl0 (Comb8) size 819200 footprint 2c1922b7 (only 0 measurements), forcing calibration for this run. Use the STARPU_CALIBRATE environment variable to control this. You probably need to run again to continue calibrating the model, until this warning disappears. [starpu][node1][starpu_history_based_job_expected_perf] Warning: model chol_model_syrk is not calibrated enough for cpu0_impl0 (Comb8) size 819200 footprint 2c1922b7 (only 0 measurements), forcing calibration for this run. Use the STARPU_CALIBRATE environment variable to control this. You probably need to run again to continue calibrating the model, until this warning disappears. [starpu][node1][__starpu_history_based_job_expected_perf] Warning: model chol_model_gemm is not calibrated enough for cpu0_impl0 (Comb8) size 1228800 footprint d46431bb (only 0 measurements), forcing calibration for this run. Use the STARPU_CALIBRATE environment variable to control this. You probably need to run again to continue calibrating the model, until this warning disappears. [starpu][node1][_starpu_update_perfmodel_history] Too big deviation for model chol_model_trsm on cpu0_impl0 (Comb8): 762.460000us vs average 25266.220000us, 1 such errors against 1 samples (-96.982295%), flushing the performance model. Use the STARPU_HISTORY_MAX_ERROR environment variable to control the threshold (currently 50%) /root/wy/starpu/examples/cholesky/.libs/cholesky_tag(+0x3f92)[0x55590e067f92] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x133520)[0x7f316e54f520] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(_starpu_cuda_driver_run_once+0x60b)[0x7f316e54feab] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x13455d)[0x7f316e55055d] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f316086cac3] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126a40)[0x7f31608fea40] cholesky_tag: cholesky/cholesky_kernels.c:289: chol_common_codelet_update_potrf: Assertion 0 && "sstatus == CUSOLVER_STATUS_SUCCESS"' failed. [error]./scheduler/../cholesky/cholesky_tag' killed with signal 6; test marked as failed

Execution_time_in_seconds 2.539470 ./scheduler/../cholesky/cholesky_tag

failure FAIL scheduler/schedulers.sh (exit status: 1)

FAIL: lu/lu.sh

[starpu][node1][starpu_initialize] Warning: FxT trace is requested but StarPU was configured without FxT support [starpu][node1][initialize_lws_policy] Warning: you are running the default lws scheduler, which is not a very smart scheduler, while the system has GPUs or several memory nodes. Make sure to read the StarPU documentation about adding performance models in order to be able to use the dmda or dmdas scheduler instead. Synthetic GFlop/s (TOTAL) :

size ms GFlop/s

640 398 0.4 640 0.42

Execution_time_in_seconds 2.685594 ./lu/lu_implicit_example_float

[starpu][node1][starpu_initialize] Warning: FxT trace is requested but StarPU was configured without FxT support [starpu][node1][initialize_lws_policy] Warning: you are running the default lws scheduler, which is not a very smart scheduler, while the system has GPUs or several memory nodes. Make sure to read the StarPU documentation about adding performance models in order to be able to use the dmda or dmdas scheduler instead. /root/wy/starpu/examples/lu/.libs/lu_implicit_example_float(+0x7162)[0x5631749df162] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x133520)[0x7f69c05a9520] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(_starpu_cuda_driver_run_once+0x60b)[0x7f69c05a9eab] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x13455d)[0x7f69c05aa55d] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f69b246cac3] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126a40)[0x7f69b24fea40] lu_implicit_example_float: lu/xlu_kernels.c:392: starpu_slu_common_getrf: Assertion 0 && "sstatus == CUSOLVER_STATUS_SUCCESS"' failed. [error]./lu/lu_implicit_example_float' killed with signal 6; test marked as failed

Execution_time_in_seconds 2.112157 ./lu/lu_implicit_example_float

FAIL lu/lu.sh (exit status: 1)

FAIL: cholesky/cholesky_julia.sh

Authorization required, but no authorization protocol specified [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA0, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA1, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA2, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA3, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA4, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA5, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA6, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA7, do you have the hwloc CUDA plugin installed? HELLO FROM MY_DM [starpu][node1][__starpu_history_based_job_expected_perf] Warning: model chol_model_potrf is not calibrated enough for cuda0_impl0 (Comb1) size 3686400 footprint 617e5fe6 (only 0 measurements), forcing calibration for this run. Use the STARPU_CALIBRATE environment variable to control this. You probably need to run again to continue calibrating the model, until this warning disappears. /root/wy/starpu/examples/cholesky/.libs/cholesky_tag(+0x3f92)[0x55ec3f8aff92] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x133520)[0x7f5d35dba520] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(_starpu_cuda_driver_run_once+0x60b)[0x7f5d35dbaeab] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x13455d)[0x7f5d35dbb55d] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f5d2806cac3] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126a40)[0x7f5d280fea40] cholesky_tag: cholesky/cholesky_kernels.c:289: chol_common_codelet_update_potrf: Assertion `0 && "sstatus == CUSOLVER_STATUS_SUCCESS"' failed. ./cholesky/cholesky_julia.sh: line 19: 2118812 Aborted (core dumped) STARPU_SCHED_LIB=$ROOT/.libs/libmy_dmda.so STARPU_SCHED=mydm $ROOT/cholesky_tag FAIL cholesky/cholesky_julia.sh (exit status: 134)

SKIP: binary/binary

Authorization required, but no authorization protocol specified [starpu][node1][compare_value_and_recalibrate] Current configuration does not match the bus performance model (CUDA: (stored) 8 != (current) 0), recalibrating... Authorization required, but no authorization protocol specified [starpu][node1][benchmark_all_memory_nodes] NUMA 0 -> 1... [starpu][node1][benchmark_all_memory_nodes] NUMA 1 -> 0... [starpu][node1][compare_value_and_recalibrate] ... done This application requires an OpenCL worker.

Execution_time_in_seconds 0.587498 ./binary/binary

SKIP binary/binary (exit status: 77)

SKIP: matvecmult/matvecmult

Authorization required, but no authorization protocol specified [starpu][node1][compare_value_and_recalibrate] Current configuration does not match the bus performance model (CUDA: (stored) 8 != (current) 0), recalibrating... Authorization required, but no authorization protocol specified [starpu][node1][benchmark_all_memory_nodes] NUMA 0 -> 1... [starpu][node1][benchmark_all_memory_nodes] NUMA 1 -> 0... [starpu][node1][compare_value_and_recalibrate] ... done [starpu][node1][_starpu_topology_check_ndevices] Warning: 1 OpenCL devices requested. Only 0 available. This application requires an OpenCL worker.

Execution_time_in_seconds 0.558648 ./matvecmult/matvecmult

SKIP matvecmult/matvecmult (exit status: 77)

FAIL: lu/lu_example_float

Authorization required, but no authorization protocol specified [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA0, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA1, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA2, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA3, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA4, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA5, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA6, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA7, do you have the hwloc CUDA plugin installed? [starpu][node1][initialize_lws_policy] Warning: you are running the default lws scheduler, which is not a very smart scheduler, while the system has GPUs or several memory nodes. Make sure to read the StarPU documentation about adding performance models in order to be able to use the dmda or dmdas scheduler instead. /root/wy/starpu/examples/lu/.libs/lu_example_float(+0x76c2)[0x55bc953ad6c2] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x133520)[0x7f85b792c520] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(_starpu_cuda_driver_run_once+0x60b)[0x7f85b792ceab] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x13455d)[0x7f85b792d55d] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f85a986cac3] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126a40)[0x7f85a98fea40] lu_example_float: lu/xlu_kernels.c:392: starpu_slu_common_getrf: Assertion 0 && "sstatus == CUSOLVER_STATUS_SUCCESS"' failed. [error]./lu/lu_example_float' killed with signal 6; test marked as failed

Execution_time_in_seconds 12.937583 ./lu/lu_example_float

FAIL lu/lu_example_float (exit status: 1)

FAIL: lu/lu_example_double

Authorization required, but no authorization protocol specified [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA0, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA1, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA2, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA3, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA4, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA5, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA6, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA7, do you have the hwloc CUDA plugin installed? [starpu][node1][initialize_lws_policy] Warning: you are running the default lws scheduler, which is not a very smart scheduler, while the system has GPUs or several memory nodes. Make sure to read the StarPU documentation about adding performance models in order to be able to use the dmda or dmdas scheduler instead. /root/wy/starpu/examples/lu/.libs/lu_example_double(+0x76fa)[0x55763a78d6fa] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x133520)[0x7f65a8018520] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(_starpu_cuda_driver_run_once+0x60b)[0x7f65a8018eab] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x13455d)[0x7f65a801955d] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f659a06cac3] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126a40)[0x7f659a0fea40] lu_example_double: lu/xlu_kernels.c:392: starpu_dlu_common_getrf: Assertion 0 && "sstatus == CUSOLVER_STATUS_SUCCESS"' failed. [error]./lu/lu_example_double' killed with signal 6; test marked as failed

Execution_time_in_seconds 15.796970 ./lu/lu_example_double

FAIL lu/lu_example_double (exit status: 1)

FAIL: lu/lu_implicit_example_float

Authorization required, but no authorization protocol specified [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA0, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA1, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA2, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA3, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA4, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA5, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA6, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA7, do you have the hwloc CUDA plugin installed? [starpu][node1][initialize_lws_policy] Warning: you are running the default lws scheduler, which is not a very smart scheduler, while the system has GPUs or several memory nodes. Make sure to read the StarPU documentation about adding performance models in order to be able to use the dmda or dmdas scheduler instead. /root/wy/starpu/examples/lu/.libs/lu_implicit_example_float(+0x7162)[0x55c3dd92d162] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x133520)[0x7f1e7e4f7520] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(_starpu_cuda_driver_run_once+0x60b)[0x7f1e7e4f7eab] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x13455d)[0x7f1e7e4f855d] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f1e7046cac3] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126a40)[0x7f1e704fea40] lu_implicit_example_float: lu/xlu_kernels.c:392: starpu_slu_common_getrf: Assertion 0 && "sstatus == CUSOLVER_STATUS_SUCCESS"' failed. [error]./lu/lu_implicit_example_float' killed with signal 6; test marked as failed

Execution_time_in_seconds 13.414619 ./lu/lu_implicit_example_float

FAIL lu/lu_implicit_example_float (exit status: 1)

FAIL: lu/lu_implicit_example_double

Authorization required, but no authorization protocol specified [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA0, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA1, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA2, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA3, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA4, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA5, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA6, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA7, do you have the hwloc CUDA plugin installed? [starpu][node1][initialize_lws_policy] Warning: you are running the default lws scheduler, which is not a very smart scheduler, while the system has GPUs or several memory nodes. Make sure to read the StarPU documentation about adding performance models in order to be able to use the dmda or dmdas scheduler instead. /root/wy/starpu/examples/lu/.libs/lu_implicit_example_double(+0x719a)[0x560dc24d619a] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x133520)[0x7fb3625e7520] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(_starpu_cuda_driver_run_once+0x60b)[0x7fb3625e7eab] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x13455d)[0x7fb3625e855d] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7fb35466cac3] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126a40)[0x7fb3546fea40] lu_implicit_example_double: lu/xlu_kernels.c:392: starpu_dlu_common_getrf: Assertion 0 && "sstatus == CUSOLVER_STATUS_SUCCESS"' failed. [error]./lu/lu_implicit_example_double' killed with signal 6; test marked as failed

Execution_time_in_seconds 16.204569 ./lu/lu_implicit_example_double

FAIL lu/lu_implicit_example_double (exit status: 1)

FAIL: cholesky/cholesky_tag

Authorization required, but no authorization protocol specified [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA0, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA1, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA2, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA3, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA4, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA5, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA6, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA7, do you have the hwloc CUDA plugin installed? [starpu][node1][initialize_lws_policy] Warning: you are running the default lws scheduler, which is not a very smart scheduler, while the system has GPUs or several memory nodes. Make sure to read the StarPU documentation about adding performance models in order to be able to use the dmda or dmdas scheduler instead. /root/wy/starpu/examples/cholesky/.libs/cholesky_tag(+0x3f92)[0x5583b46b1f92] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x133520)[0x7efc3718c520] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(_starpu_cuda_driver_run_once+0x60b)[0x7efc3718ceab] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x13455d)[0x7efc3718d55d] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7efc2946cac3] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126a40)[0x7efc294fea40] cholesky_tag: cholesky/cholesky_kernels.c:289: chol_common_codelet_update_potrf: Assertion 0 && "sstatus == CUSOLVER_STATUS_SUCCESS"' failed. [error]./cholesky/cholesky_tag' killed with signal 6; test marked as failed

Execution_time_in_seconds 6.498575 ./cholesky/cholesky_tag

FAIL cholesky/cholesky_tag (exit status: 1)

FAIL: cholesky/cholesky_tile_tag

Authorization required, but no authorization protocol specified [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA0, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA1, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA2, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA3, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA4, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA5, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA6, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA7, do you have the hwloc CUDA plugin installed? [starpu][node1][initialize_lws_policy] Warning: you are running the default lws scheduler, which is not a very smart scheduler, while the system has GPUs or several memory nodes. Make sure to read the StarPU documentation about adding performance models in order to be able to use the dmda or dmdas scheduler instead. BLOCK SIZE = 960 /root/wy/starpu/examples/cholesky/.libs/cholesky_tile_tag(+0x3d22)[0x561459dfbd22] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x133520)[0x7fa937d72520] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(_starpu_cuda_driver_run_once+0x60b)[0x7fa937d72eab] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x13455d)[0x7fa937d7355d] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7fa92a06cac3] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126a40)[0x7fa92a0fea40] cholesky_tile_tag: cholesky/cholesky_kernels.c:289: chol_common_codelet_update_potrf: Assertion 0 && "sstatus == CUSOLVER_STATUS_SUCCESS"' failed. [error]./cholesky/cholesky_tile_tag' killed with signal 6; test marked as failed

Execution_time_in_seconds 5.300046 ./cholesky/cholesky_tile_tag

FAIL cholesky/cholesky_tile_tag (exit status: 1)

FAIL: cholesky/cholesky_implicit

Authorization required, but no authorization protocol specified [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA0, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA1, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA2, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA3, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA4, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA5, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA6, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA7, do you have the hwloc CUDA plugin installed? [starpu][node1][initialize_lws_policy] Warning: you are running the default lws scheduler, which is not a very smart scheduler, while the system has GPUs or several memory nodes. Make sure to read the StarPU documentation about adding performance models in order to be able to use the dmda or dmdas scheduler instead. /root/wy/starpu/examples/cholesky/.libs/cholesky_implicit(+0x63f2)[0x56383a3d43f2] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x133520)[0x7f48abdfd520] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(_starpu_cuda_driver_run_once+0x60b)[0x7f48abdfdeab] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x13455d)[0x7f48abdfe55d] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f489e06cac3] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126a40)[0x7f489e0fea40] cholesky_implicit: cholesky/cholesky_kernels.c:289: chol_common_codelet_update_potrf: Assertion 0 && "sstatus == CUSOLVER_STATUS_SUCCESS"' failed. [error]./cholesky/cholesky_implicit' killed with signal 6; test marked as failed

Execution_time_in_seconds 6.540059 ./cholesky/cholesky_implicit

FAIL cholesky/cholesky_implicit (exit status: 1)

FAIL: cholesky/cholesky_compil

Authorization required, but no authorization protocol specified [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA0, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA1, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA2, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA3, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA4, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA5, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA6, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA7, do you have the hwloc CUDA plugin installed? [starpu][node1][initialize_lws_policy] Warning: you are running the default lws scheduler, which is not a very smart scheduler, while the system has GPUs or several memory nodes. Make sure to read the StarPU documentation about adding performance models in order to be able to use the dmda or dmdas scheduler instead. /root/wy/starpu/examples/cholesky/.libs/cholesky_compil(+0x5432)[0x55ec2a218432] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x133520)[0x7f8fa3140520] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(_starpu_cuda_driver_run_once+0x60b)[0x7f8fa3140eab] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x13455d)[0x7f8fa314155d] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f8f9546cac3] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126a40)[0x7f8f954fea40] cholesky_compil: cholesky/cholesky_kernels.c:289: chol_common_codelet_update_potrf: Assertion 0 && "sstatus == CUSOLVER_STATUS_SUCCESS"' failed. [error]./cholesky/cholesky_compil' killed with signal 6; test marked as failed

Execution_time_in_seconds 6.735512 ./cholesky/cholesky_compil

FAIL cholesky/cholesky_compil (exit status: 1)

FAIL: cholesky/cholesky_grain_tag

Authorization required, but no authorization protocol specified [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA0, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA1, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA2, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA3, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA4, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA5, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA6, do you have the hwloc CUDA plugin installed? [starpu][node1][_starpu_init_cuda_config] Warning: could not find location of CUDA7, do you have the hwloc CUDA plugin installed? [starpu][node1][initialize_lws_policy] Warning: you are running the default lws scheduler, which is not a very smart scheduler, while the system has GPUs or several memory nodes. Make sure to read the StarPU documentation about adding performance models in order to be able to use the dmda or dmdas scheduler instead. /root/wy/starpu/examples/cholesky/.libs/cholesky_grain_tag(+0x4332)[0x558f5d326332] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x133520)[0x7f87301ba520] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(_starpu_cuda_driver_run_once+0x60b)[0x7f87301baeab] /root/wy/starpu/src/.libs/libstarpu-1.4.so.1(+0x13455d)[0x7f87301bb55d] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3)[0x7f872246cac3] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126a40)[0x7f87224fea40] cholesky_grain_tag: cholesky/cholesky_kernels.c:289: chol_common_codelet_update_potrf: Assertion 0 && "sstatus == CUSOLVER_STATUS_SUCCESS"' failed. [error]./cholesky/cholesky_grain_tag' killed with signal 6; test marked as failed

Execution_time_in_seconds 6.339290 ./cholesky/cholesky_grain_tag

FAIL cholesky/cholesky_grain_tag (exit status: 1)

SKIP: sched_ctx/gpu_partition

Execution_time_in_seconds 0.092156 ./sched_ctx/gpu_partition

SKIP sched_ctx/gpu_partition (exit status: 77)

nfurmento commented 9 months ago

are you sure your CUDA devices are running correctly ? there is some small CUDA programs in tools/gpus, can you check if they are working fine ?

WwwwwYyyy commented 9 months ago

This is the result of my operation, and these examples can run normally 976f397d8e5c2b00c62cdccfbefe08b f36f08702c06809e24222e1fc7f806a

nfurmento commented 9 months ago

for safety, please also run the 3rd program, cuda_list, and no need to run with MPI, these are sequential applications

sthibaul commented 9 months ago

How to solve the problem of illegal memory access during LU decomposition using StarPU?My CUDA version is 12.2, my MPI version is 4.0, and I am using the latest version of StarPU. As shown in the figure,How can I solve this problem?Thank you! 295d5c4522d2dd376092dc59e338cc2 ecab74014fc0a23c083d5abd8b513fb

The error shows that it's the CUDA kernel that made an illegal access. Please make sure that you are properly accessing the data in your kernel, e.g. the nx vs ny vs ld parameters to properly access the matrix tile.

WwwwwYyyy commented 9 months ago

This is the running result of cuda_list: 6dfbe11a4d7f6b193f00c40ff082a53

nfurmento commented 9 months ago

ok, so nothing is wrong with your CUDA devices. As Samuel said, please make sure that you are properly accessing the data in your kernel, e.g. the nx vs ny vs ld parameters to properly access the matrix tile, as the error shows that it's the CUDA kernel that made an illegal access.

sthibaul commented 9 months ago

(concerning the testsuite failures, they are all of the same kind: cusolver failed for some reason. You can try tomorrow's 1.4 nightly snapshot, I have added some error reporting verbosity)

WwwwwYyyy commented 9 months ago

Thank you for your response. Our program can run successfully on the laptop with 1 GPU, no matter of how many MPI processes. However, on a server with multiple GPU, there are errors as shown above. The server can only run our program with 1 MPI process. In addition, the terminal on the server shows this after executing our code: drivers/cuda/driver_cuda.c:2240: _starpu_cuda_driver_run_once: Assertion `0 && "cures == cudaErrorNotReady"' failed.

sthibaul commented 9 months ago

It would help a lot if you could share the source code of your codelet function, so we can make sure it's actually properly starting the kernel, with the right stream, the right pointers, etc.

WwwwwYyyy commented 9 months ago

My program has a large amount of code, and this is one of my kernel programs. Other kernels are also written according to this pattern, with the "pangulu_getrf_cuda_kernel" program is written with functions from the CUDA library. Is it necessary for us to use StarPU_CUDA code?

ifndef PANGULU_GETRF_FP64_CUDA

define PANGULU_GETRF_FP64_CUDA

/ The function in this file is primarily called kernel,these functions would be called in pangulu_kernel_interface.h /

include "pangulu_common.h"

include "pangulu_cuda.h"

include "pangulu_cuda_interface.h"

include

struct args_getrf { pangulu_Smatrix A; pangulu_Smatrix L; pangulu_Smatrix *U;

};

void pangulu_getrf_fp64_cuda_operate(pangulu_Smatrix A, pangulu_Smatrix L, pangulu_Smatrix *U) {

pangulu_getrf_cuda_kernel(A->row,
                              A->rowpointer[A->row],
                              U->CUDA_nnzU,
                              A->CUDA_rowpointer,
                              A->CUDA_columnindex,
                              A->CUDA_value,
                              L->CUDA_rowpointer,
                              L->CUDA_columnindex,
                              L->CUDA_value,
                              U->CUDA_rowpointer,
                              U->CUDA_columnindex,
                              U->CUDA_value);

}

void pangulu_getrf_fp64_cuda_task(void buffers[], void cl_arg)
{ struct args_getrf args = (struct args_getrf) cl_arg; pangulu_getrf_fp64_cuda_operate(args->A, args->L, args->U);

}

struct starpu_codelet pangulu_getrf_fp64_cuda_task_cl = {
.cuda_funcs = {pangulu_getrf_fp64_cuda_task}, .cuda_flags = {STARPU_CUDA_ASYNC}, .nbuffers = 0 };

void pangulu_getrf_fp64_cuda_submit_task(pangulu_Smatrix A, pangulu_Smatrix L, pangulu_Smatrix U) { struct args_getrf args = {A, L, U}; struct starpu_task task = starpu_task_create(); task->synchronous = 1; task->cl = &pangulu_getrf_fp64_cuda_task_cl;

task->cl_arg = &args;
task->cl_arg_size = sizeof(args);
starpu_task_submit(task);

}

void pangulu_getrf_fp64_cuda(pangulu_Smatrix A, pangulu_Smatrix L, pangulu_Smatrix *U) { pangulu_getrf_fp64_cuda_submit_task(A, L, U);

}

/ void pangulu_getrf_fp64_cuda(pangulu_Smatrix A, pangulu_Smatrix L, pangulu_Smatrix U) {

pangulu_getrf_cuda_kernel(A->row,
                              A->rowpointer[A->row],
                              U->CUDA_nnzU,
                              A->CUDA_rowpointer,
                              A->CUDA_columnindex,
                              A->CUDA_value,
                              L->CUDA_rowpointer,
                              L->CUDA_columnindex,
                              L->CUDA_value,
                              U->CUDA_rowpointer,
                              U->CUDA_columnindex,
                              U->CUDA_value);

}

*/

endif

sthibaul commented 9 months ago

.cuda_flags = {STARPU_CUDA_ASYNC},

This is telling StarPU to use asynchronous kernel submission, but your program doesn't seem to be using the starpu CUDA stream? That can only lead to various troubles, see https://files.inria.fr/starpu/doc/html_web_faq/CheckListWhenPerformanceAreNotThere.html#CUDA-specificOptimizations

Also, apparently your kernel is taking its GPU pointers from the cl_arg, so you are not using buffers as allocated/transferred by starpu, so any pointer issue you are getting cannot be coming from starpu itself. Either you have to fix your cuda allocation/transfer code, or you have to properly register your data to starpu so that it can handle it for you.

WwwwwYyyy commented 9 months ago

Thanks for your reply! We are trying to perform sparse matrix LU decomposition with StarPU and CUDA. We tried to register the rowpointer, columnIndex and the values of the sparse matrices through starpu_vector_data_register(). We registered 12 sets of data through this function in each kernel of our program. Will the performance of the program degrade if we register such many sets of data?

nfurmento commented 9 months ago

If you have a matrix you should use starpu_matrix_data_register(). Look for examples in the StarPU documentation or directly in the StarPU examples directory.

You say you are using starpu_vector_data_register() but the codelet you sent is not using StarPU data handles.

WwwwwYyyy commented 9 months ago

ifndef PANGULU_SSSSM_FP64_CUDA_H

define PANGULU_SSSSM_FP64_CUDA_H

define STARPU_NMAXBUFS 24

/ The function in this file is primarily called kernel,these functions would be called in pangulu_kernel_interface.h /

include "pangulu_common.h"

include "pangulu_cuda.h"

include

void pangulu_ssssm_fp64_cuda_task(void buffers[], void args)

{

calculate_type *A_CUDA_value = (calculate_type*) STARPU_VECTOR_GET_PTR(buffers[0]);
int_t *A_CUDA_rowpointer = (int_t*) STARPU_VECTOR_GET_PTR(buffers[1]); 
int_t *A_CUDA_columnindex = (int_t*) STARPU_VECTOR_GET_PTR(buffers[2]); 
calculate_type *L_CUDA_value = (calculate_type*) STARPU_VECTOR_GET_PTR(buffers[3]);
int_t *L_CUDA_rowpointer = (int_t*) STARPU_VECTOR_GET_PTR(buffers[4]); 
int_t *L_CUDA_columnindex = (int_t*) STARPU_VECTOR_GET_PTR(buffers[5]); 
calculate_type *U_CUDA_value = (calculate_type*) STARPU_VECTOR_GET_PTR(buffers[6]);
int_t *U_CUDA_rowpointer = (int_t*) STARPU_VECTOR_GET_PTR(buffers[7]); 
int_t *U_CUDA_columnindex = (int_t*) STARPU_VECTOR_GET_PTR(buffers[8]);
int_t *A_bin_rowpointer = (int_t*)STARPU_VECTOR_GET_PTR(buffers[9]);
//int_t *A_CUDA_bin_rowpointer = (int_t*)STARPU_VECTOR_GET_PTR(buffers[10]);

pangulu_ssssm_cuda_kernel((int_t)A_row,
                            A_bin_rowpointer,
                            A_CUDA_bin_rowpointer,
                            A_CUDA_bin_rowindex,
                            (int_t*) U_CUDA_rowpointer,
                            (int_t*) U_CUDA_columnindex,
                            U_CUDA_value,
                            (int_t*) L_CUDA_rowpointer,
                            (int_t*) L_CUDA_columnindex,
                            L_CUDA_value,
                            (int_t *) A_CUDA_rowpointer,
                            (int_t *) A_CUDA_columnindex,
                            A_CUDA_value);

}

struct starpu_codelet pangulu_ssssm_fp64_cuda_task_cl = { .where = STARPU_CUDA, .cuda_funcs = {pangulu_ssssm_fp64_cuda_task}, .cuda_flags = {STARPU_CUDA_ASYNC}, .nbuffers = 10, .modes = {STARPU_RW, STARPU_RW, STARPU_RW,STARPU_RW, STARPU_RW, STARPU_RW, STARPU_RW, STARPU_RW, STARPU_RW, STARPU_RW}, .name = "pangulu_ssssm_fp64_cuda_task" };

void pangulu_ssssm_fp64_cuda_submit_task(pangulu_Smatrix A, pangulu_Smatrix L, pangulu_Smatrix *U) {

starpu_data_handle_t A_CUDA_value_handle;
starpu_data_handle_t A_CUDA_rowpointer_handle;
starpu_data_handle_t A_CUDA_columnindex_handle;
starpu_data_handle_t L_CUDA_value_handle;
starpu_data_handle_t L_CUDA_rowpointer_handle;
starpu_data_handle_t L_CUDA_columnindex_handle;
starpu_data_handle_t U_CUDA_value_handle;
starpu_data_handle_t U_CUDA_rowpointer_handle;
starpu_data_handle_t U_CUDA_columnindex_handle;
starpu_data_handle_t A_bin_rowpointer_handle;
//starpu_data_handle_t A_CUDA_bin_rowpointer_handle;
//starpu_data_handle_t A_CUDA_bin_rowindex_handle;

starpu_vector_data_register(&A_bin_rowpointer_handle, STARPU_MAIN_RAM,  
                        (uintptr_t)A->bin_rowpointer, A->row, sizeof(int_t));    
starpu_vector_data_register(&A_CUDA_value_handle, STARPU_MAIN_RAM, 
                            (uintptr_t)A->CUDA_value, A->nnz, sizeof(calculate_type));        
starpu_vector_data_register(&A_CUDA_rowpointer_handle, STARPU_MAIN_RAM, 
                            (uintptr_t)A->CUDA_rowpointer, A->row, sizeof(int_t));      
starpu_vector_data_register(&A_CUDA_columnindex_handle, STARPU_MAIN_RAM, 
                            (uintptr_t)A->CUDA_columnindex, A->nnz, sizeof(int_t));    
starpu_vector_data_register(&L_CUDA_value_handle, STARPU_MAIN_RAM, 
                            (uintptr_t)L->CUDA_value, L->nnz, sizeof(calculate_type));
starpu_vector_data_register(&L_CUDA_rowpointer_handle, STARPU_MAIN_RAM, 
                            (uintptr_t)L->CUDA_rowpointer, L->row, sizeof(int_t)); 
starpu_vector_data_register(&L_CUDA_columnindex_handle, STARPU_MAIN_RAM, 
                            (uintptr_t)L->CUDA_columnindex, L->nnz, sizeof(int_t)); 
starpu_vector_data_register(&U_CUDA_value_handle, STARPU_MAIN_RAM, 
                            (uintptr_t)U->CUDA_value, U->nnz, sizeof(calculate_type));      
starpu_vector_data_register(&U_CUDA_rowpointer_handle, STARPU_MAIN_RAM, 
                            (uintptr_t)U->CUDA_rowpointer, U->row, sizeof(int_t));      
starpu_vector_data_register(&U_CUDA_columnindex_handle, STARPU_MAIN_RAM, 
                            (uintptr_t)U->CUDA_columnindex, U->nnz, sizeof(int_t));      

struct starpu_task *task = starpu_task_create();
task->synchronous = 1;
task->cl = &pangulu_ssssm_fp64_cuda_task_cl;
task->handles[0] = A_CUDA_value_handle;
task->handles[1] = A_CUDA_rowpointer_handle;
task->handles[2] = A_CUDA_columnindex_handle;
task->handles[3] = L_CUDA_value_handle;
task->handles[4] = L_CUDA_rowpointer_handle;
task->handles[5] = L_CUDA_columnindex_handle;   
task->handles[6] = U_CUDA_value_handle;
task->handles[7] = U_CUDA_rowpointer_handle;
task->handles[8] = U_CUDA_columnindex_handle;
task->handles[9] = A_bin_rowpointer_handle;

starpu_task_submit(task);

starpu_data_unregister(A_CUDA_value_handle);
starpu_data_unregister(A_CUDA_rowpointer_handle);
starpu_data_unregister(A_CUDA_columnindex_handle);
starpu_data_unregister(L_CUDA_value_handle);
starpu_data_unregister(L_CUDA_rowpointer_handle);
starpu_data_unregister(L_CUDA_columnindex_handle);
starpu_data_unregister(U_CUDA_value_handle);
starpu_data_unregister(U_CUDA_rowpointer_handle);
starpu_data_unregister(U_CUDA_columnindex_handle);
//starpu_data_unregister(L_sparse_matrix_handle);

}

void pangulu_ssssm_fp64_cuda(pangulu_Smatrix A, pangulu_Smatrix L, pangulu_Smatrix *U) { pangulu_ssssm_fp64_cuda_submit_task(A, L, U); }

/ void pangulu_ssssm_fp64_cuda(pangulu_Smatrix A, pangulu_Smatrix L, pangulu_Smatrix U) { pangulu_ssssm_cuda_kernel(A->row, A->bin_rowpointer, //row A->CUDA_bin_rowpointer, //row A->CUDA_bin_rowindex, //Annz
U->CUDA_rowpointer, // row U->CUDA_columnindex, // nnz U->CUDA_value, // nnz L->CUDA_rowpointer, // row L->CUDA_columnindex, //nnz L->CUDA_value, //nnz A->CUDA_rowpointer, //row A->CUDA_columnindex, // nnz A->CUDA_value); //nnz } */

endif

Our sparse matrix is stored in CSR format, so we only know the length of each array, so we cannot use starpu_matrix_data_register( ),This is the code I have modified. Three matrices need to be registered in this kernel, and I use starpu_vector_data_register( ) to register the rowpointer, columnIndex, and values of each matrix. This increases the number of registrations, will this affect performance?

sthibaul commented 9 months ago

Registering data is not that expensive, but you should rather use starpu_csr_data_register which is exactly meant for that, see the examples/spmv/spmv.c example.