Open kidapu opened 7 years ago
Which version of CUDA8.0 is this ?
@prabindh
I use nvidia/cuda:8.0-devel-ubuntu16.04
from this Dockerfile.
https://gitlab.com/nvidia/cuda/blob/ubuntu16.04/8.0/devel/cudnn5/Dockerfile#L1
I strongly feel it may not be related to CUDNN. Did you stop the training in both of them after reasonable accuracies have been obtained in training ? Can you let the CUDNN version run longer epochs and check ?
I have re-trained my face data by CuDNN =1 once. Following graph shows my train log, (x,y) = (epoch, loss rate). I have tried 29000 epochs.
My CuDNN version is 5.1.10. The result is unchaged. CuDNN=1 isn't working. But CuDNN=0 works fine.
But I try to do following example, by CuDNN=1 and CuDNN=0, It works fine...
./darknet-cpp detector demo cfg/coco.data cfg/yolo.cfg yolo.weights
@kidapu have you sorted this? I have the same problem. I trained tiny yolo and it only works when CUDNN = 0. But this problem only happens when I try to link libdarknet-cpp-shared.so to my program. The ./darknet binary still works fine.
My environment: Ubuntu 16 Cuda 8 Cudnn 6 GTX 1050
@bobeo No. I have not solved. Completely same happens to me!!!
@bobeo Have you ensured your wrapper application (that uses the .so) also has the same options that are used for building the darknet shared lib ?
@kidapu Does inference work with CUDNN=1, with the shared lib ?
In summary, the following happens in my case.
(1) CuDNN == 0 && ( darknet-cpp || darknet-cpp-shared) coco & my dataset works fine.
(2) CuDNN == 1 && ( darknet-cpp || darknet-cpp-shared)
Is this behaviour seen with the latest master as well ? Please check the latest master and confirm
I need to confirm but I have this behaviour on v6.5-1-g372b25d with a GPU machine:
@ooobelix please confirm - that you are building Arapaho, and darknet with same options (for GPU, CUDNN) in both the Makefiles.
I'm working on!
~/darknet$ grep -i "^GPU=|^CUDNN" Makefile arapaho/Makefile Makefile:GPU=1
Makefile:CUDNN=1
arapaho/Makefile:GPU=1
arapaho/Makefile:CUDNN=1
After that, I'm using my own code with Arapaho to do some predictions.
Thanks for your help!
Could you confirm, what cfg is being used ?
From GIT:
5d442b0e550e6c640068e7e15e498599 yolov3.cfg
With 0.1 threshold
I'm:
Results:
I think you already tried with GPU=1, but I observed that in the last comment GPU is not defined.
my application with CFLAGS "-DCUDNN"
Sorry it's a mistake, you are right! I have already tested with GPU=1 and CUDNN=1
I tried the Arapaho build (Windows build from darknet-cpp-windows) with latest code, and the config:- Yolo-tinyv3 cfg, and CUDA91. I am able to see detections with the default yolov3 weights.
Ok, I did a stupid mistake into CMakeFile with the GPU and CUDNN options.
Now it works well with GPU=1 and CUDNN=1 but no need of linking "-lcudnn", is it normal?
"-lcudnn" should be required. Can we close this as the issue is resolved ?
I'm using CMakeList and "CUDNN=1" to "_set(LNKDEP [...] cudnn" and it works well. For me, you can close this issue.
I trained face with FDDB Datasets ( I wrote in #13 ), and I tried to detect face, but I can't with CUDNN=1.
On the other hand, I can detect face successfully with CUDNN=0.
My Enviroment is below.