matlab-deep-learning / mask-rcnn

Mask-RCNN training and prediction in MATLAB for Instance Segmentation
https://uk.mathworks.com/help/vision/ug/getting-started-with-mask-r-cnn-for-instance-segmentation.html
Other
29 stars 5 forks source link

Error using nnet.internal.cnn.dlnetwork/forward #4

Closed Aymanbegh closed 3 years ago

Aymanbegh commented 3 years ago

Dear Anchit,

I have got an issue during the execution of the function "dlfeval" which gives me back this:

"Error using nnet.internal.cnn.dlnetwork/forward (line 254) Layer 'res5b_branch2c': Invalid input data. Out of memory on device. To view more detail about available memory on the GPU, use 'gpuDevice()'. If the problem persists, reset the GPU by calling 'gpuDevice(1)'.

Error in nnet.internal.cnn.dlnetwork/CodegenOptimizationStrategy/propagateWithFallback (line 103) [varargout{1:nargout}] = fcn(net, X, layerIndices, layerOutputIndices);

Error in nnet.internal.cnn.dlnetwork/CodegenOptimizationStrategy/forward (line 52) [varargout{1:nargout}] = propagateWithFallback(strategy, functionSlot, @forward, net, X, layerIndices, layerOutputIndices);

Error in dlnetwork/forward (line 347) [varargout{1:nargout}] = net.EvaluationStrategy.forward(net.PrivateNetwork, x, layerIndices, layerOutputIndices);

Error in networkGradients (line 21) [YRPNRegDeltas, proposal, YRCNNClass, YRCNNReg, YRPNClass, YMask, state] = forward(...

Error in deep.internal.dlfeval (line 18) [varargout{1:nout}] = fun(x{:});

Error in dlfeval (line 41) [varargout{1:nout}] = deep.internal.dlfeval(fun,varargin{:});

Error in MaskRCNN (line 129) [gradients, loss, state] = dlfeval(@networkGradients, X, gtBox, gtClass, gtMask, dlnet, params);"

I've tried to change the imagesize (reduce it) as well as the minibatchSize and the maxEpochs. I use a computer with a AMD Ryzen 4 serie 4000 and a NVIDIA Geforce RTX 2060 (6 GB) with the network of the example (resnet101) and the COCO 2014 dataset.

Calcu-dev commented 3 years ago

I'm having a similar issue with RAM usage as mentioned in my issue. What are the values of your imageSize and minibatchSize? Using 512x512 and a minibatchSize of 2 (on my CPU, not GPU), I get consistent usage of 24+Gb of RAM.

The only reason I ask for these values is just to compare against what I've observed and to provide more information for when @anchitdharmw looks into this issue.

Best, Adam

Aymanbegh commented 3 years ago

Thank you, good idea Adam. I've used 512x512x3 for the imageSize and 1 for the minibatchSize. But i have to notify that the example works on my CPU of 16Gb (with these settings and also for minibatchSize=2), it is just slow (40s for one iteration). I've observed the RAM memory consumption and it looks use more than 8Gb, maybe 10 Gb. I think that's why this example doesn't work on my GPU which has just 6Gb.

Best, Ayman

anchitdharmw commented 3 years ago

@Aymanbegh , I have updated the repo with support for resnet50 backbone which can be used during network creation. This will help lower the memory footprint of your training.