starpu-runtime / starpu

This is a mirror of https://gitlab.inria.fr/starpu/starpu where our development happens, but contributions are welcome here too!
https://starpu.gitlabpages.inria.fr/
GNU Lesser General Public License v2.1
58 stars 13 forks source link

LU with StarPU and CUDA #22

Closed TommyUW closed 1 year ago

TommyUW commented 1 year ago

I am trying to run LU with StarPU and CUDA from the "examples" file. However, the result shows: "Segmentation fault (core dumped)". Also, when I was trying to run starpu_machine_display, I got the same result. However, if I run: "STARPU_NCUDA=0 starpu_machine_display", I could get information about CPUs. The sample CUDA programs that I wrote such as vector addition works fine, so CUDA is operable on my laptop.

Steps to reproduce

Please describe how you make the issue happen, so we can reproduce it.

I installed CUDA 12.1 and Nvidia driver 530.41.03. I set the environment variable as: export PATH:$PATH:/usr/local/cuda-12.1/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-12.1/lib64:/usr/lib/x86_64-linux-gnu Then I reconfigured StarPU with command: ./configure --disable-opencl Make and sudo make install I entered lu file in the example file and try to run lu_example_double.o file. My command is: STARPU_NCUDA=1 ./lu_example_double

Obtained behavior

Please describe the result of your actions, and notably what you got that you didn't expect Then the result is "segmentation fault (core dumped)". However, if I set STARPU_NCUDA=0, I could get the running time of the program.

Expected behavior

Please describe the result that you expected instead. I expect to get the running time of this program. The running time should be faster with CUDA enabled.

Configuration

The configure line you used. ./configure --disable-opencl

Configuration result

Please attach the config.log file from the build tree.

Distribution

Its type and version Debian Ubuntu 20.04

Version of StarPU

StarPU-master tarball The tarball version, or the git branch hash.

Version of GPU drivers

NVIDIA RTX Geforce 3050 (Laptop))) If you are using CUDA/OpenCL/HIP, the version being used. CUDA: 12.1 NVIDIA driver: 530.41.03 config.log

No OpenCL and HIP installed

nfurmento commented 1 year ago

hello, we need the backtrace of your segfault to be able to help.

TommyUW commented 1 year ago

hello, we need the backtrace of your segfault to be able to help. I tried to use gdb to backtrace the segfault. However, as shown in the screenshot, "File format not recognized". My gdb works fine on the StarPU program I wrote by myself. However, every program in the "examples" file shows the same error. Screenshot from 2023-06-06 20-26-56

nfurmento commented 1 year ago

You need to use libtool for programs within the build directory

libtool --mode=execute gdb application

nfurmento commented 1 year ago

In your first message, you said you are using a tarball from the StarPU master, but in your screenshot it shows starpu-1.4.0.

I would recommend you to try with the latest release https://files.inria.fr/starpu/starpu-1.4.1/starpu-1.4.1.tar.gz and make sure you do not have in your environment any variable pointing to another StarPU installation.

TommyUW commented 1 year ago

Thank you for your reply. We have installed the newest version and run the command. However, the result shows that: Warning: could not find location of CUDA0, do you have the hwloc CUDA plugin installed? Screenshot from 2023-06-11 19-43-32

nfurmento commented 1 year ago

Send us please the backtrace of the segfault.

nfurmento commented 1 year ago

and STARPU_CUDA is not a variable. You want to set STARPU_NCUDA

sthibaul commented 1 year ago

The warning is somehow unrelated (the raised question still holds, though).

A backtrace could have been useful indeed. In that case it seems there is only one possible issue with the obj pointer, so I fixed it and it should fix the issue. But really, backtraces are a must to know what is going wrong.

TommyUW commented 1 year ago

Thank you for your reply. Here, I have input the correct command and run the backtrace. Screenshot from 2023-06-12 07-21-03 Screenshot from 2023-06-12 07-21-09

sthibaul commented 1 year ago

ok, that confirms what I thought, thanks, the fix will get to github within a day