Tensorflow frozen pb to Caffe error. KeyError: 'Clip' and AttributeError: min

goncz commented 4 years ago

Platform: Ubuntu 16.04 Python version: 3.6.9 Source framework with version: Tensorflow 1.12.2 with Tensorflow GPU 1.12.2

I'm trying to run the Frozen graph conversion example from https://github.com/microsoft/MMdnn/tree/master/mmdnn/conversion/tensorflow However, when i run the command

mmconvert -sf tensorflow -iw mobilenet_v1_1.0_224/frozen_graph.pb --inNodeName input --inputShape 224,224,3 --dstNodeName MobilenetV1/Predictions/Softmax -df caffe -om tf_mobilenet

I get the following errors:

mmconvert -sf tensorflow -iw mobilenet_v1_1.0_224/frozen_graph.pb --inNodeName input --inputShape 224,224,3 --dstNodeName MobilenetV1/Predictions/Softmax -df caffe -om tf_mobilenet
/home/anaconda3/envs/tf_imagerecog_new/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/anaconda3/envs/tf_imagerecog_new/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/anaconda3/envs/tf_imagerecog_new/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/anaconda3/envs/tf_imagerecog_new/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/anaconda3/envs/tf_imagerecog_new/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/anaconda3/envs/tf_imagerecog_new/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
WARNING:tensorflow:From /home/anaconda3/envs/tf_imagerecog_new/lib/python3.6/site-packages/tensorflow/python/tools/strip_unused_lib.py:86: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
2020-06-17 11:04:43.769739: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2020-06-17 11:04:43.792569: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2020-06-17 11:04:43.793201: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55a6066fdd30 executing computations on platform Host. Devices:
2020-06-17 11:04:43.793212: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2020-06-17 11:04:43.911828: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-17 11:04:43.912778: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
totalMemory: 7.76GiB freeMemory: 7.02GiB
2020-06-17 11:04:43.912792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2020-06-17 11:04:43.913896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-17 11:04:43.913904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2020-06-17 11:04:43.913908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2020-06-17 11:04:43.914013: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6833 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-06-17 11:04:43.915077: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55a60aec70c0 executing computations on platform CUDA. Devices:
2020-06-17 11:04:43.915088: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5
IR network structure is saved as [601e3728ecf74f1282602df5fc674983.json].
IR network structure is saved as [601e3728ecf74f1282602df5fc674983.pb].
IR weights are saved as [601e3728ecf74f1282602df5fc674983.npy].
Parse file [601e3728ecf74f1282602df5fc674983.pb] with binary format successfully.
Target network code snippet is saved as [601e3728ecf74f1282602df5fc674983.py].
Target weights are saved as [601e3728ecf74f1282602df5fc674983.npy].
Traceback (most recent call last):
  File "/home/anaconda3/envs/tf_imagerecog_new/lib/python3.6/site-packages/caffe/net_spec.py", line 160, in _to_proto
    _param_names[self.type_name] + '_param'), k, v)
KeyError: 'Clip'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/anaconda3/envs/tf_imagerecog_new/bin/mmconvert", line 8, in <module>
    sys.exit(_main())
  File "/home/anaconda3/envs/tf_imagerecog_new/lib/python3.6/site-packages/mmdnn/conversion/_script/convert.py", line 112, in _main
    dump_code(args.dstFramework, network_filename + '.py', temp_filename + '.npy', args.outputModel, args.dump_tag)
  File "/home/anaconda3/envs/tf_imagerecog_new/lib/python3.6/site-packages/mmdnn/conversion/_script/dump_code.py", line 32, in dump_code
    save_model(MainModel, network_filepath, weight_filepath, dump_filepath)
  File "/home/naconda3/envs/tf_imagerecog_new/lib/python3.6/site-packages/mmdnn/conversion/caffe/saver.py", line 9, in save_model
    MainModel.make_net(dump_net)
  File "601e3728ecf74f1282602df5fc674983.py", line 153, in make_net
    print(n.to_proto(), file=fpb)
  File "/home/anaconda3/envs/tf_imagerecog_new/lib/python3.6/site-packages/caffe/net_spec.py", line 193, in to_proto
    top._to_proto(layers, names, autonames)
  File "/home/anaconda3/envs/tf_imagerecog_new/lib/python3.6/site-packages/caffe/net_spec.py", line 97, in _to_proto
    return self.fn._to_proto(layers, names, autonames)
  File "/home/anaconda3/envs/tf_imagerecog_new/lib/python3.6/site-packages/caffe/net_spec.py", line 162, in _to_proto
    assign_proto(layer, k, v)
  File "/home/anaconda3/envs/tf_imagerecog_new/lib/python3.6/site-packages/caffe/net_spec.py", line 64, in assign_proto
    is_repeated_field = hasattr(getattr(proto, name), 'extend')
AttributeError: min

linmajia commented 4 years ago

@goncz , thank you very much for the feedback. It seems that your Caffe does not support the Clip operator, which is used to implement Relu6. Here is a workaround:

Go to the installation directory of MMdnn. E.g. ~/.local/lib/pythonX.Y/site-packages/mmdnn , or /usr/lib/python3/dist-packages/mmdnn
Open file "mmdnn/conversion/caffe/caffe_emitter.py" .
Locate the function "def emit_Relu6(self, IR_node)" (e.g., line 609)
Replace its implementation with the following single line of code: self.emit_Relu(IR_node)
Then, Relu will be used to to simulate Relu6. To be noted, the final learning performance may not be exactly the same to that of the source model.

linmajia commented 4 years ago

Duplicate of #836

goncz commented 4 years ago

@goncz , thank you very much for the feedback. It seems that your Caffe does not support the Clip operator, which is used to implement Relu6. Here is a workaround:

Go to the installation directory of MMdnn. E.g. ~/.local/lib/pythonX.Y/site-packages/mmdnn , or /usr/lib/python3/dist-packages/mmdnn

Open file "mmdnn/conversion/caffe/caffe_emitter.py" .

Locate the function "def emit_Relu6(self, IR_node)" (e.g., line 609)

Replace its implementation with the following single line of code: self.emit_Relu(IR_node)

Then, Relu will be used to to simulate Relu6. To be noted, the final learning performance may not be exactly the same to that of the source model.

Thank you @linmajia, this solved my problem.

goncz commented 4 years ago

However, i now have this problem:

I0618 10:51:23.611192  5265 layer_factory.hpp:77] Creating layer MobilenetV1_MobilenetV1_Conv2d_6_pointwise_BatchNorm_batchnorm_add_scale
I0618 10:51:23.611218  5265 net.cpp:122] Setting up MobilenetV1_MobilenetV1_Conv2d_6_pointwise_BatchNorm_batchnorm_add_scale
I0618 10:51:23.611222  5265 net.cpp:129] Top shape: 1 512 14 14 (100352)
I0618 10:51:23.611225  5265 net.cpp:137] Memory required for data: 68155264
I0618 10:51:23.611229  5265 layer_factory.hpp:77] Creating layer MobilenetV1_MobilenetV1_Conv2d_6_pointwise_Relu6
I0618 10:51:23.611233  5265 net.cpp:84] Creating Layer MobilenetV1_MobilenetV1_Conv2d_6_pointwise_Relu6
I0618 10:51:23.611249  5265 net.cpp:406] MobilenetV1_MobilenetV1_Conv2d_6_pointwise_Relu6 <- MobilenetV1_MobilenetV1_Conv2d_6_pointwise_BatchNorm_batchnorm_add
I0618 10:51:23.611253  5265 net.cpp:367] MobilenetV1_MobilenetV1_Conv2d_6_pointwise_Relu6 -> MobilenetV1_MobilenetV1_Conv2d_6_pointwise_BatchNorm_batchnorm_add (in-place)
I0618 10:51:23.611537  5265 net.cpp:122] Setting up MobilenetV1_MobilenetV1_Conv2d_6_pointwise_Relu6
I0618 10:51:23.611544  5265 net.cpp:129] Top shape: 1 512 14 14 (100352)
I0618 10:51:23.611563  5265 net.cpp:137] Memory required for data: 68556672
I0618 10:51:23.611567  5265 layer_factory.hpp:77] Creating layer MobilenetV1_MobilenetV1_Conv2d_7_depthwise_depthwise
I0618 10:51:23.611572  5265 net.cpp:84] Creating Layer MobilenetV1_MobilenetV1_Conv2d_7_depthwise_depthwise
I0618 10:51:23.611575  5265 net.cpp:406] MobilenetV1_MobilenetV1_Conv2d_7_depthwise_depthwise <- MobilenetV1_MobilenetV1_Conv2d_6_pointwise_BatchNorm_batchnorm_add
I0618 10:51:23.611579  5265 net.cpp:380] MobilenetV1_MobilenetV1_Conv2d_7_depthwise_depthwise -> MobilenetV1_MobilenetV1_Conv2d_7_depthwise_depthwise
F0618 10:51:24.405036  5265 cudnn_conv_layer.cpp:53] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0)  CUDNN_STATUS_INTERNAL_ERROR
*** Check failure stack trace: ***
Aborted (core dumped)

I have experienced something similar before, i then added the following code before running the tf.Session. In what script should i write this here?

config = tf.ConfigProto()
config.gpu_options.allow_growth = True

linmajia commented 4 years ago

@goncz , since MMdnn invokes the underlying deep learning frameworks, I suggest that you try the CPU mode to avoid out-of-GPU-memory issues by temporarily hiding the GPUs: export CUDA_VISIBLE_DEVICES=" "

microsoft / MMdnn

Tensorflow frozen pb to Caffe error. KeyError: 'Clip' and AttributeError: min #856