not support lastest cuda 9.0 and cudnn7

encore2020 commented 6 years ago

installed cuda9.0 and cudnn7(cuda 9.0)

if I select cudnn =1, that will be compile error: /examples/go.c:641:13: warning: ignoring return value of ‘scanf’, declared with attribute warn_unused_result [-Wunused-result] scanf("%s", type); ^ gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/rnn.c -o obj/rnn.o ./examples/rnn.c: In function ‘get_seq2seq_data’: ./examples/rnn.c:104:13: warning: unused variable ‘dlen’ [-Wunused-variable] int dlen = strlen(dest[index]); ^ ./examples/rnn.c:103:13: warning: unused variable ‘slen’ [-Wunused-variable] int slen = strlen(source[index]); ^ gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/segmenter.c -o obj/segmenter.o gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/regressor.c -o obj/regressor.o gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/classifier.c -o obj/classifier.o gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/coco.c -o obj/coco.o gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/yolo.c -o obj/yolo.o gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/detector.c -o obj/detector.o gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/nightmare.c -o obj/nightmare.o gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/attention.c -o obj/attention.o gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./examples/darknet.c -o obj/darknet.o gcc -Iinclude/ -Isrc/ -DOPENCV pkg-config --cflags opencv -DGPU -I/usr/local/cuda/include/ -DCUDNN -Wall -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN obj/captcha.o obj/lsd.o obj/super.o obj/art.o obj/tag.o obj/cifar.o obj/go.o obj/rnn.o obj/segmenter.o obj/regressor.o obj/classifier.o obj/coco.o obj/yolo.o obj/detector.o obj/nightmare.o obj/attention.o obj/darknet.o libdarknet.a -o darknet -lm -pthread pkg-config --libs opencv -L/usr/local/cuda/lib64 -lcuda -lcudart -lcublas -lcurand -lcudnn -lstdc++ libdarknet.a libdarknet.a(convolutional_layer.o): In function cudnn_convolutional_setup': convolutional_layer.c:(.text+0xcbc): undefined reference tocudnnSetConvolutionGroupCount' collect2: error: ld returned 1 exit status Makefile:76: recipe for target 'darknet' failed make: *** [darknet] Error 1 ubuntu@ubuntu-Z270N-WIFI:~/darknet$

------------- my opencv is lastest version 3.3 if I select cudnn=0, gpu = 1, compile is ok, after run the command,

sudo ./darknet detector train cfg/voc.data cfg/tiny-yolo.cfg darknet.conv.weights tiny-yolo layer filters size input output 0 conv 16 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 16 1 max 2 x 2 / 2 416 x 416 x 16 -> 208 x 208 x 16 2 conv 32 3 x 3 / 1 208 x 208 x 16 -> 208 x 208 x 32 3 max 2 x 2 / 2 208 x 208 x 32 -> 104 x 104 x 32 4 conv 64 3 x 3 / 1 104 x 104 x 32 -> 104 x 104 x 64 5 max 2 x 2 / 2 104 x 104 x 64 -> 52 x 52 x 64 6 conv 128 3 x 3 / 1 52 x 52 x 64 -> 52 x 52 x 128 7 max 2 x 2 / 2 52 x 52 x 128 -> 26 x 26 x 128 8 conv 256 3 x 3 / 1 26 x 26 x 128 -> 26 x 26 x 256 9 max 2 x 2 / 2 26 x 26 x 256 -> 13 x 13 x 256 10 conv 512 3 x 3 / 1 13 x 13 x 256 -> 13 x 13 x 512 11 max 2 x 2 / 1 13 x 13 x 512 -> 13 x 13 x 512 12 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 13 conv 512 3 x 3 / 1 13 x 13 x1024 -> 13 x 13 x 512 14 conv 425 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 425 15 detection mask_scale: Using default '1.000000' Loading weights from darknet.conv.weights...Done! Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005 Resizing 384 Loaded: 0.017535 seconds Region Avg IOU: 0.073414, Class: 0.005123, Obj: 0.433099, No Obj: 0.503481, Avg Recall: 0.000000, count: 3 CUDA Error: mapping of buffer object failed darknet: ./src/cuda.c:36: check_error: Assertion `0' failed. Aborted (core dumped)

if I select, gpu =0, only run cpu, compile and running is both ok

AurusHuang commented 6 years ago

Well...you complained about it too in Caffe area... Be sure to check if your CUDA and cuDNN are installed properly. I will also try to repeat the issue. Stay with me.

ppantalone commented 6 years ago

I ran into the same problem. I had a successful, new network running using darknet with gnu support using cuda 8.0 and cudnn 6, but moving this same network and code to a cuda 9.0 / cudnn 7 environment did not work. The problem seems to be related to more forward and backward convolutional methods being added in cuda 9.0 / cudnn 7, some of which return workspace size of '0'. To account for these cases, checking for zero size workspace and processing the CPU fixed the problem for me. The changes I made where in convolutional_kernels.cu and where as follows:

in forward_convolutional_layer_gpu function: original code: float one = 1; cudnnConvolutionForward(cudnn_handle(), &one, l.srcTensorDesc, net.input_gpu, l.weightDesc, l.weights_gpu, l.convDesc, l.fw_algo, net.workspace, l.workspace_size, &one, l.dstTensorDesc, l.output_gpu); new code: if (l.workspace_size > 0) { float one = 1; cudnnConvolutionForward(cudnn_handle(), &one, l.srcTensorDesc, net.input_gpu, l.weightDesc, l.weights_gpu, l.convDesc, l.fw_algo, net.workspace, l.workspace_size, &one, l.dstTensorDesc, l.output_gpu); } else { int i, j; int m = l.n/l.groups; int k = l.sizel.sizel.c/l.groups; int n = l.out_wl.out_h; for(i = 0; i < l.batch; ++i){ for(j = 0; j < l.groups; ++j){ float a = l.weights_gpu + jl.nweights/l.groups; float b = net.workspace; float c = l.output_gpu + (il.groups + j)nm;

        im2col_gpu(net.input_gpu + (i*l.groups + j)*l.c/l.groups*l.h*l.w,
            l.c/l.groups, l.h, l.w, l.size, l.stride, l.pad, b);
        gemm_gpu(0,0,m,n,k,1,a,k,b,n,1,c,n);
    }
}
}

Also in function backward_convolutional_layer_gpu Original code: float one = 1; cudnnConvolutionBackwardFilter(cudnn_handle(), &one, l.srcTensorDesc, net.input_gpu, l.ddstTensorDesc, l.delta_gpu, l.convDesc, l.bf_algo, net.workspace, l.workspace_size, &one, l.dweightDesc, l.weight_updates_gpu);

if(net.delta_gpu){
    if(l.binary || l.xnor) swap_binary(&l);
    cudnnConvolutionBackwardData(cudnn_handle(),
            &one,
            l.weightDesc,
            l.weights_gpu,
            l.ddstTensorDesc,
            l.delta_gpu,
            l.convDesc,
            l.bd_algo,
            net.workspace,
            l.workspace_size,
            &one,
            l.dsrcTensorDesc,
            net.delta_gpu);
    if(l.binary || l.xnor) swap_binary(&l);
    if(l.xnor) gradient_array_gpu(original_input, l.batch*l.c*l.h*l.w, HARDTAN, net.delta_gpu);

New code if (l.workspace_size > 0) { float one = 1; cudnnConvolutionBackwardFilter(cudnn_handle(), &one, l.srcTensorDesc, net.input_gpu, l.ddstTensorDesc, l.delta_gpu, l.convDesc, l.bf_algo, net.workspace, l.workspace_size, &one, l.dweightDesc, l.weight_updates_gpu);

if(net.delta_gpu){
    if(l.binary || l.xnor) swap_binary(&l);
    cudnnConvolutionBackwardData(cudnn_handle(),
            &one,
            l.weightDesc,
            l.weights_gpu,
            l.ddstTensorDesc,
            l.delta_gpu,
            l.convDesc,
            l.bd_algo,
            net.workspace,
            l.workspace_size,
            &one,
            l.dsrcTensorDesc,
            net.delta_gpu);
    if(l.binary || l.xnor) swap_binary(&l);
    if(l.xnor) gradient_array_gpu(original_input, l.batch*l.c*l.h*l.w, HARDTAN, net.delta_gpu);
}
}
else
{
int m = l.n/l.groups;
int n = l.size*l.size*l.c/l.groups;
int k = l.out_w*l.out_h;

int i, j;
for(i = 0; i < l.batch; ++i){
    for(j = 0; j < l.groups; ++j){
        float *a = l.delta_gpu + (i*l.groups + j)*m*k;
        float *b = net.workspace;
        float *c = l.weight_updates_gpu + j*l.nweights/l.groups;

        float *im = net.input_gpu+(i*l.groups + j)*l.c/l.groups*l.h*l.w;

        im2col_gpu(im, l.c/l.groups, l.h, l.w,
                l.size, l.stride, l.pad, b);
        gemm_gpu(0,1,m,n,k,1,a,k,b,k,1,c,n);

        if(net.delta_gpu){
            if(l.binary || l.xnor) swap_binary(&l);
            a = l.weights_gpu + j*l.nweights/l.groups;
            b = l.delta_gpu + (i*l.groups + j)*m*k;
            c = net.workspace;

            gemm_gpu(1,0,n,k,m,1,a,n,b,k,0,c,k);

            col2im_gpu(net.workspace, l.c/l.groups, l.h, l.w, l.size, l.stride, 
                l.pad, net.delta_gpu + (i*l.groups + j)*l.c/l.groups*l.h*l.w);
            if(l.binary || l.xnor) {
                swap_binary(&l);
            }
        }
        if(l.xnor) gradient_array_gpu(original_input + i*l.c*l.h*l.w, l.c*l.h*l.w, HARDTAN, net.delta_gpu + i*l.c*l.h*l.w);
    }
}
}

I hope this helps and please let me know if there are any other insights into this issue or other cuda 9 / cudnn 7 conversion issues

Liedermaus commented 6 years ago

This problem is probably due to multiple versions of CUDA installed on your computer, especially if you use autoupdates for CUDA (which you shouldn't). In my case the problem came from a PATH-variable that includes CUDA: ...:/usr/local/cuda-8.0/bin/:... When CUDA is upgraded for examle to 9.1 you need to update this to ...:/usr/local/cuda-9.1/bin/:... Otherwise the wrong nvidia compiler (nvcc ) is used

But I would recommend to remove the old cuda version and do a clean install with the new cuda. Please also remember to update cudnn, because it depends on the CUDA version, so you have to be carefull which version you select...

TanFluent commented 6 years ago

@encore2020
1.check your default cuda version(nvcc --version). and its install path(which nvcc); 2.if the default "nvcc" is not what you want. GO to "Makefile" line 49 & 51,refine the cuda path to your-cuda-path;

Grabber commented 6 years ago

You are welcome to write your own code base and spend years to do so. Don't be stupid.

On Thu, Mar 8, 2018 at 3:54 PM, waschbaer00 notifications@github.com wrote:

I just want to say f***....Just install CUDA 9.1, cuDNN 7.1, OpenCV3.4.1, and it seems like none of them compatible to darknet. Why not wirte clear what are the compatible version for darknet, I waste two days.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pjreddie/darknet/issues/278#issuecomment-371586573, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA9cz_yjCzoZkEsKhoo2kCQKIG9thngks5tcX5ogaJpZM4QM_V9 .

-- Regards,

Luiz Vitor Martinez Cardoso

"The only limits are the ones you place upon yourself"

Yumin-Sun-00 commented 6 years ago

YOu are right.. controlling my anger..

kb1ooo commented 6 years ago

LOL, I don't think it's possible to find a deep learning framework with fewer dependencies. This one has exactly 1 required dependency. I think the "mapping of buffer object" error is due to running out of GPU memory. Try increasing your subdivisions up to the same value as "batch". If that works, then decrease by powers of 2 to find the lowest value for which it will not crash. To clarify, the subdivisions is a setting in the cfg file.

AlexeyAB commented 6 years ago

@waschbaer00 There is bug in C API in the OpenCV 3.4.1: https://github.com/opencv/opencv/issues/10963 Use OpenCV 3.4.0 or lower.

nuannuan1991 commented 6 years ago

Dear @TanFluent My GPU: GeForce GT 1030， Computing capacity 6.1，ubuntu 16.04 cuda is release 7.5, V7.5.17, which nvcc is: /usr/local/cuda-7.5/bin//nvcc so my makefile is: GPU=1 CUDNN=0 OPENCV=0 OPENMP=0 DEBUG=0 ........ ifeq ($(GPU), 1) COMMON+= -DGPU -I/usr/local/cuda-7.5/include/ CFLAGS+= -DGPU LDFLAGS+= -L/usr/local/cuda-7.5/lib64 -lcuda -lcudart -lcublas -lcurand endif

when I make it, the following error has occurred: /usr/include/string.h: In function ‘void __mempcpy_inline(void, const void, size_t)’: /usr/include/string.h:652:42: error: ‘memcpy’ was not declared in this scope return (char ) memcpy (dest, src, n) + n; ^ compilation terminated due to -Wfatal-errors. Makefile:88: recipe for target 'obj/convolutional_kernels.o' failed make: *** [obj/convolutional_kernels.o] Error 1

I compiled it many times, every time I have the same error, I really don't know how to modify it. Can you give me some support? thanks a lot!!!

nuannuan1991 commented 6 years ago

error @TanFluent

thanif commented 4 years ago

The following fix worked for me.

ifeq ($(CUDNN), 1) COMMON+= -DCUDNN ifeq ($(OS),Darwin) #MAC CFLAGS+= -DCUDNN -I/usr/local/cuda/include LDFLAGS+= -L/usr/local/cuda/lib -lcudnn else ( The fix ) CFLAGS+= -DCUDNN -I/usr/local/include LDFLAGS+= -L/usr/local/include -lcudnn endif endif

pjreddie / darknet

not support lastest cuda 9.0 and cudnn7 #278