vlfeat / matconvnet

MatConvNet: CNNs for MATLAB
Other
1.4k stars 753 forks source link

GPU is available for the mnist and cifar example, but not available for my own data #344

Closed iiwindii closed 8 years ago

iiwindii commented 8 years ago

hi, everyone, I was using matconvnet-1.0-beta16, the GPU support is OK for the mnist and cifar example,but when I use GPU for my own data, the following error arise:

Error using .* Out of memory on device. To view more detail about available memory on the GPU, use 'gpuDevice()'. If the problem persists, reset the GPU by calling 'gpuDevice(1)'.

Error in vl_nnrelu (line 40) y = dzdy .* (x > single(0)) ;

Error in vl_simplenn (line 365) res(i).dzdx = vl_nnrelu(res(i).x, res(i+1).dzdx, leak{:}) ;

Even if I adopt a very small batchsize, the same error arise too. But I can implement my code using cpu mode, so the code itself is right.

Is there any advice?

iiwindii commented 8 years ago

@vedaldi could you give some advice about this problem? Thanks!

jrruijli commented 8 years ago

Seems like you don't have enough memory on your GPU.

You can check the network with vl_simplenn_display to get an idea of GPU memory requirements.

Jasper

iiwindii commented 8 years ago

@jrruijli thanks for your response. But I don't think the problem is not enough memory. In fact, I can implement the code rightly using matconvnet-1.0-beta16 before. But when I recompile a new matconvnet-1.0-beta16, then the error arise.

jrruijli commented 8 years ago

Did you follow the Matlab instructions? "Out of memory on device. To view more detail about available memory on the GPU, use 'gpuDevice()'. If the problem persists, reset the GPU by calling 'gpuDevice(1)'."

Maybe you did not close your previous matlab and the memory on the GPU has not been freed up. Or somebody else is using it with an application.

iiwindii commented 8 years ago

@jrruijli yes, I tried this. But the problem still exist.........I have never encounter this problem before....

lenck commented 8 years ago

Hi, are you using the cuDNN or the standard kernels?

On way how to debug this is also put a breakpoint after each layer operation and see the free memory with e.g. nvidia-smi...

iiwindii commented 8 years ago

@lenck yes, cuDNN is used. This error is confusing. Because I can implement my code correctly using matconvnet-1.0-beta16 before. But when I download a new matconvnet-1.0-beta16 from the homepage and recompile it (gpu and cudnn support is available), the error arise..... I was using matlab 2015a+vs2012+windows+cuda6.5+cudnn v2

lenck commented 8 years ago

We have added a new option CudnnWorskpaceLimit for the function vl_nnconv, so maybe changing this value may help (if you are using the imagenet examples, it is in examples/imagenet/cnn_imagenet_init.m)...

iiwindii commented 8 years ago

@lenck thanks. I was using the cifar example and change the code for my own data. can I change the new option CudnnWorskpaceLimit in the function vl_nnconv ?

lenck commented 8 years ago

The vl_nnconv is a mex file, so you cannot edit it (unless you over-shadow it) and is being called from vl_simplenn. But take a look how is it done in cnn_imagenet_init.m, there the layer.opts field is being set for each initialised layer...

iiwindii commented 8 years ago

@lenck hi, I have added CudnnWorskpaceLimit in my code, the error "Out of memory on device" is solved. However, there is another error as following.....

Error using vl_nnconv The option name is not a string (argument number 8)

Error in vl_simplenn (line 262) res(i+1).x = vl_nnconv(res(i).x, l.weights{1}, l.weights{2}, ...

Error in cnn_train>process_epoch (line 309) res = vl_simplenn(net, im, dzdy, res, ...

Error in cnn_train (line 127) [net,stats.train,prof] = process_epoch(opts, getBatch, epoch, train, learningRate, imdb, net) ;

vl_nnconv is a mex file, so I cannot open it and check where the error is. I have tried a solution, namely set "opts.cudnn = false" in cnn_train, but the same error arise. This is really a confusing problem!

lenck commented 8 years ago

Hi, this seems like the arguments are wrong - what does the 8th argument passed to vl_nnconv looks like, as the error message suggests?

iiwindii commented 8 years ago

@lenck Hi, I tracked the code, the error exists in cnn_train→ res = vl_simplenn()→ %Forward pass% case 'conv' res(i+1).x = vl_nnconv(res(i).x, l.weights{1}, l.weights{2}, ... 'pad', l.pad, ... 'stride', l.stride, ... l.opts{:}, ... cudnn{:}) ; I have not changed anything in cnn_train and vl_simplenn. I also tried matconvnet-1.0-beta15 (I can run my code correctly without any problem), but now the same error arise........Really a terrible problem!

cheer37 commented 8 years ago

I also encountered at similar problem. cifar and mnist are fine working, but my project is increasing the cpu memory when running the building blocks in vl_simplenn. It's causing the problem regardless how to build MatConvNet. Please help me.

iiwindii commented 8 years ago

@cheer37 have you solved your problem? it still exists in my project.....the same data and network work well using the version before 17, I was wondering if something is wrong or not in the newer version

cheer37 commented 8 years ago

I down sized the input image size to 128 from 224. Problem has disappeared. Verdaldi said matlab memory managing is tricky, so it's not easy to understand, i might be in corner situation. Nothing else to reveal the reason of problem.

zgplvyou commented 8 years ago

@cheer37 how did you solve your problem? I am not clear about that 'I down sized the input image size to 128 from 224'. I was using matconvnet-1.0-beta21(Follow the Web page‘http://www.vlfeat.org/matconvnet/quick/’) .When I run into this step(res = vlsimplenn(net, im) ;),there is another error as following..... error using vl_nnconv The option name is not a string (argument number 5)

error vl_simplenn (line 300) res(i+1).x = vl_nnconv(res(i).x, l.weights{1}, l.weights{2}, ... maybe my expression is not clear but if you can understand give a solution plz

zgplvyou commented 8 years ago

I solved the problem. you need load the latest version of matconvnet. there are some bug in the previous version.

Addhi86 commented 8 years ago

I have downloaded the latest version of MatConvNet. When I run the MINST example. it gives me this error. Error in vl_simplenn (line 300) res(i+1).x = vl_nnconv(res(i).x, l.weights{1}, l.weights{2}, ...

Error in cnn_train>processEpoch (line 316) res = vl_simplenn(net, im, dzdy, res, ...

Anyone, please tell me how to resolve this error? MatConvNet 23 Matlab 2015b 16 GB RAM

iiwindii commented 8 years ago

@Addhi86 sorry, I cannot figure out what's wrong. I have never encountered such error. If you compile matconvnet successfully, the error should not arise.

Addhi86 commented 8 years ago

@fengyunxiaozi Yes this is another problem. When i try compile I get this error

Error using vl_compilenn>check_clpath (line 599) Unable to find cl.exe

Error in vl_compilenn (line 417) cl_path = fileparts(check_clpath()); % check whether cl.exe in path

zgplvyou commented 7 years ago

choose the version matconvnet-1.0-22.there are some compliing errors on other version.if you couldn't install it,I will send the installation package to you

在 2016-10-20 00:00:33,"fengyunxiaozi" notifications@github.com 写道:

Closed #344.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Addhi86 commented 7 years ago

It is solved. Thanks @zgplvyou

HosnaCSE commented 7 years ago

Hello,

I have 12GB RAM on GPU (Titan) and 8GM Ram on CPU. My net is encounter error as "Out of memory on device" which is required only 4GM RAM. Does anyone have similar experience? Any suggestion? (I can run this with 128X128 image which require 2GB but I need bigger image like 256X256 for Semantic Segmentation)

Thanks. Hosna