CUDNN=1 is not working ?

kidapu commented 7 years ago

I trained face with FDDB Datasets ( I wrote in #13 ), and I tried to detect face, but I can't with CUDNN=1.

$ vim MakeFile

GPU=1
CUDNN=1
OPENCV=1
DEBUG=1

$ ./darknet-cpp detector test cfg/face.data cfg/tiny-yolo-face.cfg tiny-yolo-face_final.weights FaceData2/JPEGImages/2002-07-19-big-img_254.jpg

screenshot from 2017-04-11 10-40-16

On the other hand, I can detect face successfully with CUDNN=0.

$ vim MakeFile

GPU=1
CUDNN=0
OPENCV=1
DEBUG=1

$ ./darknet-cpp detector test cfg/face.data cfg/tiny-yolo-face.cfg tiny-yolo-face_final.weights FaceData2/JPEGImages/2002-07-19-big-img_254.jpg

screenshot from 2017-04-11 10-44-49

My Enviroment is below.

nvidia-docker
nvidia tesla k40c (12G GPU)
Ubuntu 16.04
opencv 2.4 (installed by libopencv-dev)
CUDA 8.0
cudnn 5.1

prabindh commented 7 years ago

Which version of CUDA8.0 is this ?

kidapu commented 7 years ago

@prabindh I use nvidia/cuda:8.0-devel-ubuntu16.04 from this Dockerfile. https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/8.0/devel/cudnn5/Dockerfile#L1

prabindh commented 7 years ago

I strongly feel it may not be related to CUDNN. Did you stop the training in both of them after reasonable accuracies have been obtained in training ? Can you let the CUDNN version run longer epochs and check ?

kidapu commented 7 years ago

I have re-trained my face data by CuDNN =1 once. Following graph shows my train log, (x,y) = (epoch, loss rate). I have tried 29000 epochs.

My CuDNN version is 5.1.10. The result is unchaged. CuDNN=1 isn't working. But CuDNN=0 works fine.

But I try to do following example, by CuDNN=1 and CuDNN=0, It works fine...

./darknet-cpp detector demo cfg/coco.data cfg/yolo.cfg yolo.weights

bobeo commented 7 years ago

@kidapu have you sorted this? I have the same problem. I trained tiny yolo and it only works when CUDNN = 0. But this problem only happens when I try to link libdarknet-cpp-shared.so to my program. The ./darknet binary still works fine.

My environment: Ubuntu 16 Cuda 8 Cudnn 6 GTX 1050

kidapu commented 7 years ago

@bobeo No. I have not solved. Completely same happens to me!!!

prabindh commented 7 years ago

@bobeo Have you ensured your wrapper application (that uses the .so) also has the same options that are used for building the darknet shared lib ?

prabindh commented 7 years ago

@kidapu Does inference work with CUDNN=1, with the shared lib ?

kidapu commented 7 years ago

In summary, the following happens in my case.

(1) CuDNN == 0 && ( darknet-cpp || darknet-cpp-shared) coco & my dataset works fine.

(2) CuDNN == 1 && ( darknet-cpp || darknet-cpp-shared)

coco works fine
my dataset is not work...

prabindh commented 6 years ago

Is this behaviour seen with the latest master as well ? Please check the latest master and confirm

ooobelix commented 6 years ago

I need to confirm but I have this behaviour on v6.5-1-g372b25d with a GPU machine:

(CuDNN == 0 || CuDNN == 1) && GPU == 1 && darknet-cpp-shared && arapaho : no detection
(CuDNN == 0 || CuDNN == 1) && GPU == 0 && darknet-cpp-shared && arapaho : detections

prabindh commented 6 years ago

@ooobelix please confirm - that you are building Arapaho, and darknet with same options (for GPU, CUDNN) in both the Makefiles.

ooobelix commented 6 years ago

I'm working on!

~/darknet$ grep -i "^GPU=|^CUDNN" Makefile arapaho/Makefile Makefile:GPU=1
Makefile:CUDNN=1
arapaho/Makefile:GPU=1
arapaho/Makefile:CUDNN=1

After that, I'm using my own code with Arapaho to do some predictions.

Thanks for your help!

prabindh commented 6 years ago

Could you confirm, what cfg is being used ?

ooobelix commented 6 years ago

From GIT:

5d442b0e550e6c640068e7e15e498599 yolov3.cfg

With 0.1 threshold

ooobelix commented 6 years ago

I'm:

compiling libdarknet-cpp-shared.so with GPU=1 and CUDNN=1
using your Arapaho code into my application with CFLAGS "-DCUDNN" and link with "cuda cudart cublas curand cudnn"

Results:

without GPU, it works well
with GPU, Detect return always 0 detection

prabindh commented 6 years ago

I think you already tried with GPU=1, but I observed that in the last comment GPU is not defined.

my application with CFLAGS "-DCUDNN"

ooobelix commented 6 years ago

Sorry it's a mistake, you are right! I have already tested with GPU=1 and CUDNN=1

prabindh commented 6 years ago

I tried the Arapaho build (Windows build from darknet-cpp-windows) with latest code, and the config:- Yolo-tinyv3 cfg, and CUDA91. I am able to see detections with the default yolov3 weights.

ooobelix commented 6 years ago

Ok, I did a stupid mistake into CMakeFile with the GPU and CUDNN options.

Now it works well with GPU=1 and CUDNN=1 but no need of linking "-lcudnn", is it normal?

prabindh commented 6 years ago

"-lcudnn" should be required. Can we close this as the issue is resolved ?

ooobelix commented 6 years ago

I'm using CMakeList and "CUDNN=1" to "_set(LNKDEP [...] cudnn" and it works well. For me, you can close this issue.

prabindh / darknet

CUDNN=1 is not working ? #23