microsoft / MMdnn

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.
MIT License
5.79k stars 964 forks source link

When converting TF mobilenet_v2 to Caffe, GPU memory exhausted #337

Open Zhengtq opened 6 years ago

Zhengtq commented 6 years ago

Platform (like ubuntu 16.04/win10): ubuntu 16.04 Python version: 2.7 Source framework with version (like Tensorflow 1.4.1 with GPU): Tensorflow 1.9 Destination framework with version (like CNTK 2.3 with GPU): Caffe Pre-trained model path (webpath or webdisk path): mobilenet_v2 Running scripts: mmconvert -sf tensorflow -in mobilenet_v2.ckpt.meta -iw mobilenet_v2.ckpt --dstNodeName MobilenetV2/Logits/Zoutput -df caffe -om mobilenet_v2

When converting the tf-slim model moblienet_v2https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet using the command "mmconvert -sf tensorflow -in mobilenetv2.ckpt.meta -iw mobilenetv2.ckpt --dstNodeName MobilenetV2/Logits/Zoutput -df caffe -om tf_resnet", I got this error "F0727 13:04:17.189086 23400 cudnn_conv_layer.cpp:53] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR". After I check my gpu memory, I can confirm that this error is due to the exhaustion of the gpu memory. To be mentioned, I set the depth_multiplier to 0.8 and my input size is 320x320x3. My gpu is 8x1080Ti, which has a memory of 11GB per gpu. Nevertheless I can only use one gpu. Can any one tells my why this happens. Or can anyone tells how to fully use all 8 gpus.

kitstar commented 6 years ago

Hi @Zhengtq, it is a known issue. You can get the tested conversion from pytest, even in CPU only environment. Currently we have no idea about the crash reason.

Zhengtq commented 6 years ago

Well, thanks anyway. Hope the problem could be solved in future.

kitstar commented 6 years ago

Thanks @Zhengtq , we will fix it when we have bandwidth!

ujsyehao commented 5 years ago

@kitstar Hi, Can you solver the problem?