pjreddie / darknet

Convolutional Neural Networks
http://pjreddie.com/darknet/
Other
25.78k stars 21.33k forks source link

Error 139 Compiling With CUDA #1480

Open flaviobeck opened 5 years ago

flaviobeck commented 5 years ago

Hello, I have installed darknet and compile for CPU. it works fine!

After, I have installed cuda:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39       Driver Version: 418.39       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce MX110       On   | 00000000:01:00.0 Off |                  N/A |
| N/A   46C    P0    N/A /  N/A |    369MiB /  2004MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1934      G   /usr/lib/xorg/Xorg                            24MiB |
|    0      2099      G   /usr/bin/gnome-shell                          54MiB |
|    0      2836      G   /usr/lib/xorg/Xorg                           159MiB |
|    0      3065      G   /usr/bin/gnome-shell                          65MiB |
|    0      4889      G   ...uest-channel-token=14897289977315395416    61MiB |
+-----------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

When I try to compile again (Changed Makefile GPU=1), I have the following error:

gcc -Iinclude/ -Isrc/ -DGPU -I/usr/local/cuda-10.1/include/ -Wall -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DGPU -c ./src/gemm.c -o obj/gemm.o
./src/gemm.c: In function ‘time_gpu’:
./src/gemm.c:232:9: warning: ‘cudaThreadSynchronize’ is deprecated [-Wdeprecated-declarations]
         cudaThreadSynchronize();
         ^~~~~~~~~~~~~~~~~~~~~
In file included from /usr/local/cuda-10.1/include/cuda_runtime.h:96,
                 from include/darknet.h:11,
                 from ./src/utils.h:5,
                 from ./src/gemm.c:2:
/usr/local/cuda-10.1/include/cuda_runtime_api.h:955:57: note: declared here
 extern __CUDA_DEPRECATED __host__ cudaError_t CUDARTAPI cudaThreadSynchronize(void);

gcc -Iinclude/ -Isrc/ -DGPU -I/usr/local/cuda-10.1/include/ -Wall -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DGPU -c ./src/utils.c -o obj/utils.o
.
.
.
.
.

nvcc  -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=[sm_50,compute_50] -gencode arch=compute_52,code=[sm_52,compute_52] -Iinclude/ -Isrc/ -DGPU -I/usr/local/cuda-10.1/include/ --compiler-options "-Wall -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DGPU" -c ./src/convolutional_kernels.cu -o obj/convolutional_kernels.o
Segmentation fault (core dumped)
make: *** [Makefile:92: obj/convolutional_kernels.o] Error 139

Still running with CPU: ./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg

Loading weights from yolov3.weights...Done!
data/dog.jpg: Predicted in **19.136767 seconds**.
dog: 100%
truck: 92%
bicycle: 99%

how to fix this error to compile for CUDA ?

ghost commented 5 years ago

Looks like nvcc crashed. Have you tried CUDA 10.0 instead of 10.1?

aerobiotic commented 5 years ago

I have darknet working with CUDA 10.1. Had some other issues. But to give you hope it should work.

I am on the master branch. I have not freshened the project in a while. Hash is 80d9bec20f0a44ab07616215c6eadb2d633492fe so perhaps try that specific version.

Like you ... since I see this in your output ... we both have edited the Makefile and updated paths to cuda near line 50

COMMON+= -DGPU -I/usr/local/cuda-10.1/include/
LDFLAGS+= -L/usr/local/cuda-10.1/lib64 -lcuda -lcudart -lcublas -lcurand

Then I got errors about could not find cuda near the end of the build.

-L/usr/local/cuda-10.1/lib64 -lcuda -lcudart -lcublas -lcurand -lstdc++ 
/usr/bin/ld: cannot find -lcuda
collect2: error: ld returned 1 exit status
Makefile:85: recipe for target 'libdarknet.so' failed
make: *** [libdarknet.so] Error 1

Fixed the above by adding a symlink to the cuda library I found in linux-gnu folder.

cd /usr/local/cuda-10.1/lib64
sudo ln -s  /usr/lib/x86_64-linux-gnu/libcuda.so.418.39 libcuda.so

The build worked fine and I was able to run the example below at about 75 times faster than on the system CPU:

./darknet detect cfg/yolo.cfg yolo.weights data/dog.jpg

I am on Ubuntu 18.04 I got most of my tools (compilers, and libraries)l from following the instructions for OpenCV 4.0.1 build. Those are found below. Perhaps it is a compiler version or make issue?

https://github.com/spmallick/learnopencv/blob/master/InstallScripts/installOpenCV-4-on-Ubuntu-18-04.sh

LIMHARRY commented 5 years ago

omg!! It works!!! thanks aerobiotic

OSSome01 commented 4 years ago

image

I tried the above command. Still not working!!. Please help

eltonfernando commented 4 years ago

my version cudnn-10.1-linux-x64-v7.6.5.32.tgz Just replace at /src/gemm.c:232: cudaThreadSynchronize() with cudaDeviceSynchronize()