microsoft / MMdnn

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.
MIT License
5.8k stars 965 forks source link

can't convert model from tensorflow to caffe with tf.layrers.batch_normalization #645

Open 2h4dl opened 5 years ago

2h4dl commented 5 years ago

Platform (like ubuntu 16.04/win10): ubuntu 16.04 Python version: python 2.7 Source framework with version (like Tensorflow 1.4.1 with GPU): Tensorflow 1.12 Destination framework with version (like CNTK 2.3 with GPU): caffe Pre-trained model path (webpath or webdisk path):

Running scripts:

mmconvert -sf tensorflow -in model-lenet-17000.meta -iw model-lenet-17000 --dstNodeName dense2/outputs -df caffe -om caffe-lenet

Error Info:

TensorflowEmitter has not supported operator [Switch] with name [conv1/batch_normalization/cond/Switch].
TensorflowEmitter has not supported operator [Switch] with name [conv2/batch_normalization/cond/Switch].
TensorflowEmitter has not supported operator [Switch] with name [conv1/batch_normalization/cond/FusedBatchNorm/Switch_1].
TensorflowEmitter has not supported operator [Switch] with name [conv1/batch_normalization/cond/FusedBatchNorm_1/Switch_1].
TensorflowEmitter has not supported operator [Switch] with name [conv1/batch_normalization/cond/FusedBatchNorm/Switch_2].
TensorflowEmitter has not supported operator [Switch] with name [conv1/batch_normalization/cond/FusedBatchNorm_1/Switch_2].
TensorflowEmitter has not supported operator [Switch] with name [conv1/batch_normalization/cond/FusedBatchNorm_1/Switch_3].
TensorflowEmitter has not supported operator [Switch] with name [conv1/batch_normalization/cond/FusedBatchNorm_1/Switch_4].
TensorflowEmitter has not supported operator [Switch] with name [conv2/batch_normalization/cond/FusedBatchNorm/Switch_1].
TensorflowEmitter has not supported operator [Switch] with name [conv2/batch_normalization/cond/FusedBatchNorm_1/Switch_1].
TensorflowEmitter has not supported operator [Switch] with name [conv2/batch_normalization/cond/FusedBatchNorm/Switch_2].
TensorflowEmitter has not supported operator [Switch] with name [conv2/batch_normalization/cond/FusedBatchNorm_1/Switch_2].
TensorflowEmitter has not supported operator [Switch] with name [conv2/batch_normalization/cond/FusedBatchNorm_1/Switch_3].
TensorflowEmitter has not supported operator [Switch] with name [conv2/batch_normalization/cond/FusedBatchNorm_1/Switch_4].
TensorflowEmitter has not supported operator [Switch] with name [conv1/batch_normalization/cond/FusedBatchNorm/Switch].
TensorflowEmitter has not supported operator [Switch] with name [conv1/batch_normalization/cond/FusedBatchNorm_1/Switch].
Traceback (most recent call last):
  File "/usr/local/bin/mmconvert", line 11, in <module>
    sys.exit(_main())
  File "/usr/local/lib/python2.7/dist-packages/mmdnn/conversion/_script/convert.py", line 102, in _main
    ret = convertToIR._convert(ir_args)
  File "/usr/local/lib/python2.7/dist-packages/mmdnn/conversion/_script/convertToIR.py", line 115, in _convert
    parser.run(args.dstPath)
  File "/usr/local/lib/python2.7/dist-packages/mmdnn/conversion/common/DataStructure/parser.py", line 22, in run
    self.gen_IR()
  File "/usr/local/lib/python2.7/dist-packages/mmdnn/conversion/tensorflow/tensorflow_parser.py", line 424, in gen_IR
    func(current_node)
  File "/usr/local/lib/python2.7/dist-packages/mmdnn/conversion/tensorflow/tensorflow_parser.py", line 800, in rename_FusedBatchNorm
    self.set_weight(source_node.name, 'scale', self.ckpt_data[scale.name])
KeyError: u'conv1/batch_normalization/gamma/read'

It can be succeed if I restore this model and save again. But I got something difference in processing conversion. It seems model dropped some parameters.

Source model conversion:

Parse file [model-lenet-17000.meta] with binary format successfully.
Tensorflow model file [model-lenet-17000.meta] loaded successfully.
Tensorflow checkpoint file [model-lenet-17000] loaded successfully. [36] variables loaded.

Re-saved model conversion:

Parse file [model-lenet.meta] with binary format successfully.
Tensorflow model file [model-lenet.meta] loaded successfully.
Tensorflow checkpoint file [model-lenet] loaded successfully. [14] variables loaded.

22 parameters dropped after re-save.

rainLiuplus commented 5 years ago

Hi @2h4dl, would you mind uploading the model so I can check it to solve the problem?

2h4dl commented 5 years ago

@rainLiuplus https://drive.google.com/file/d/1zizj49H0pdkWmEXshPaXI3qtUd7r3cbG/view?usp=sharing, plz check it.

JiahaoYao commented 5 years ago

Hi @2h4dl, I think the first attempt failed because of tf.cond in the batch norm. It might be using the tf.contrib.layers.batch_norm. image

Similar problems also happen when others try to import the frozen graph in here.

So, what is the difference when you resave the model? How big is it? I think it is probably because of the batchnorm.

2h4dl commented 5 years ago

Hi @JiahaoYao. You mean tf.layers.batch_normalization as same as tf.contrib.layers.batch_norm? After resave model, some variables lost, model is smaller than before. How to use batch norm in tensorflow to avoid it?

JiahaoYao commented 5 years ago

Hi @2h4dl, if you resave the model and tf.cond's are eliminated, it is possible for your model to be smaller due to the paremeters in tf.cond. We have met with this kind of this before, as mentioned here. I think simply using tf.layer.batch_norm is safe.

2h4dl commented 5 years ago

Hi @JiahaoYao. As you mentioned, after resave the model, tf.cond is gone. But still have a question, sorry about that. In this wiki, it says tf.cond exists in slim not tf.layer. But I trained this model with tf.layers.batch_normalization, is there something wrong with my model code. This is my code here:

weights = get_weights(w_shape, regualizer)
conv = conv2d(x_input, weights, padding)
norm = tf.layers.batch_normalization(conv, training=istrain)
conv_relu = activation(norm)
xyl3902596 commented 4 years ago

Hi @2h4dl , I have also met this problem, how did you fixed finally?