opencv / opencv_contrib

Repository for OpenCV's extra modules
Apache License 2.0
9.41k stars 5.76k forks source link

Unknown layer type error in tf_importer.cpp while running import->populateNet(Net) #1029

Closed mhaghighat closed 7 years ago

mhaghighat commented 7 years ago
System information (version)
Detailed description

I am trying to run the tf_inception.cpp example. The compilation is successful, but it gives a run-time error. The error happens at line 83 calling: importer->populateNet(net);

populateNet goes through all layers in a for loop at line 508 of the tf_importer.cpp. However, both name and type of the layer # 1 of the tensorflow_inception_graph.pb (when li = 1), is DecodeJpeg, which does not match any case in the loop and results in the error in line 729:

...
else
{
     printLayerAttr(layer);
     CV_Error_(Error::StsError, ("Unknown layer type %s in op %s", type.c_str(), name.c_str()));
}
arrybn commented 7 years ago

Hi, @mhaghighat! This is a problem due to there are some util layers in tensorflow graphs (like DecodeJpeg layer to load jpeg images). To launch tensorflow model in the dnn module you need firstly to remove all util layers from the model, To do this, you can use this script: freeze_graph.py (and you need to install python and tensorflow). This script removes all things which are irrelevant to inference and saves cleared model, so you need to launch this script only once. The script has next launch parameters which you should specify: --input_graph=/hdd/tf_models/alexnet_full/conv_graph.pb (input .pb file with graph description) --input_checkpoint=/hdd/tf_models/alexnet_full/conv_checkpoint.ckpt (input .ckpt ckeckpoint file with weights) --output_graph=/hdd/tf_models/alexnet_full/conv_graph_frozen.pb (path to output cleared graph) --output_node_names=prob (name of the layer with output you're interested in) --input_binary (specify input files format) Please let me know If you have any problems with this script. I'll consider adding feature of removing util layers when loading tf models in dnn module

arrybn commented 7 years ago

My post above is for case when you've trained your own model tensorflow and have .pb and .ckpt files. Regarding to your case, I've checked once more and model from here: https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip works as is in dnn module. Please check that you're using this model. It's possible that some layers (specific) are missed in current release of dnn module. DecodeJpeg layer is util layer, it can be replaced by Identity layer when parsing and I'll try to do it soon. I'll let you know if I suggest solution for this issue.

mhaghighat commented 7 years ago

Hi @arrybn, Thank you for your reply. I am using the model in your provided link (the '_tensorflow_inceptiongraph.pb' frozen model), and I still have that issue. How do you suggest that I should replace the util layers, like DecodeJpeg, with Identity? Won't it affect the result?

arrybn commented 7 years ago

Hi, @mhaghighat. I've checked once more time: downloaded the archive from here: https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip, unzipped it and used tensorflow_inception_graph.pb model file. And c++ sample tf_inception.cpp works correctly. Also I've checked python interfaces - also no problems, model has been loaded and output has been computed. Please check launch parameters for the sample, maybe it uses previously downloaded model from other source. As I've found, there are 2 types of inception model distributed by Google - one with DecodeJpeg and other layers like subtract mean, normalize and so on (call them preprocessed layers because they don't have learnable parameters and do conversion of some output to form which appropriate for the model); and another - without all preprocessing layers. If you remove preprocessing layers and do all removed operations by yourself all will stay correct. In case of our sample tf_inception.cpp we manually decode images from jpeg by cv::imread() and can do any preprocessing. To sum up, please double check paths (maybe remove all inception model you've already downloaded, and download new from the provided link) because all works fine for me

arrybn commented 7 years ago

Hello, @mhaghighat. Are you still experiencing described issue?

chvatma2 commented 7 years ago

Hello, I have encountered the same issue as described here. We are using the inception model, retrained on our images as shown in this tutorial: https://www.tensorflow.org/tutorials/image_retraining. The suggested freeze_graph.py script did not solve the problem. It would be useful if the dnn module could handle models created this way as the official Tensorflow tutorials will surely be an entry point for many programmers.

chrisrn commented 7 years ago

@arrybn I have 2 questions:

1) The freeze_graph script removes the processing nodes or we need to do it manually? I ran this script for my graph and I am getting Unknown layer type which is from preprocessing I think. 2) I tried to load the inception5h model from your link and the setBlob function cannot recognize the input name (Requested blob 'input' not found). I saw that the input node from this model is called 'input' and the output node is called 'output'. You fed different values to i_blob and o_blob?

OS: KUbuntu 14.04

arrybn commented 7 years ago

@chrisrn

  1. As far as I know, the freeze_graph script save all Variable nodes as Const and removes some other utility nodes. But it doesn't change preprocessing layers such as mean-variance normalization, DecodeJpeg and other nodes. You should cut them off manually in tensorflow.
  2. There is an example of using inception5h in tensorflow in dnn. Use ".input" as name for input layer and "softmax2" as output name (see sample)
chrisrn commented 7 years ago

Thank you it worked, I could just use the default values! Ok so I will try to remove the preprocessing layers and I will inform you if it works for my model.

chrisrn commented 7 years ago

@arrybn I don't think that the freeze_graph script removes layers. In my case I used a specific way of reading tensorflow images from TFR format using queues. The 'FIFOQueueV2' layer is not removed after running freeze_graph. And I think it's not a good solution to change the way of reading and restart training because when you have to do with big data, tensorflow proposes this way of reading. One solution could be to export another graph during the prediction without having training variables.

chrisrn commented 7 years ago

It is working now on my model with the above solution

arrybn commented 7 years ago

@chrisrn I think output blob has name which differs from "output". You can get all blobs names by invoking Net::getLayerNames(), and it's highly likely that the last name from the returned vector is for output blob

chrisrn commented 7 years ago

@arrybn You can run it if you export the graph from a simplified python predictor without the preprocessing layers. But I see some differences when I predict using the tf_inception script (for my model) and I cannot understand why. The steps are the following:

1) Export .pbtxt from python predictor. 2) Feed the weights using freeze_graph script from tensorflow and export .pb file. 3) Predict by feeding the .pb file to the tf_inception script.

arrybn commented 7 years ago

@chrisrn could you share with me all necessary files: .pbtxt, .pb ? And what is python predictor?

chrisrn commented 7 years ago

By "python predictor" I mean that you don't need to export the graph during training. You can create a tensorflow script only for prediction without the preprocessing layers. Tensorflow has to do with more layers during training that's why I mentioned that. right_graph_gray1.txt I can only upload the pbtxt file

arrybn commented 7 years ago

To make things clear:

  1. You have right_graph_gray1.txt and right_graph_gray1.cpkt files after training.
  2. You use freeze_graph.py to turn them into .pb unite file (call this file deploy.pb)
  3. Output of tensorflow differs from output of dnn then you're launching deploy.pb with some input. Is all correct? If yes, first check that inputs are the same. If image is in jpeg format, it's highly recommended to use png or other non-compressing formats. The reason is different frameworks and libs use different implementations of jpeg decoding
chrisrn commented 7 years ago

Everything right. The only difference is at step 1 where I am exporting the pbtxt file from another script in which the preprocessing layers do not exist. Inputs are the same that's not the problem. I am handling .avi video file using opencv on both python and c++ prediction.

arrybn commented 7 years ago

But tensorflow accepts NHWC-ordered blobs, and dnn's blobFromImage() returns NCHW

chrisrn commented 7 years ago

So you propose the conversion to NHCW?

arrybn commented 7 years ago

I just want to put your attention to this fact and to my mind it can be the issue. Could you send me your .pb file which you're loading to dnn?

arrybn commented 7 years ago

Suppose you have frame from OpenCV's VideoCapture. This is a 3-channels BGR numpy.ndarray with HWC dimensions order. Tensorflow accepts 4-dimensional blobs and you can add one dimension to the input image (to make it 4-dimensional NHWC) and pass it to tensorflow. But dnn works with NCHW. So you need to reorder dimansions. You can do it by numpy.transpose() function

chrisrn commented 7 years ago

deploy.txt

.pb format is not supported that's why it's txt. You can convert it back to its normal .pb format.

chrisrn commented 7 years ago

I think that if I reorder the dimensions of the image that enters the dnn::Blob::fromImages function it can work.

arrybn commented 7 years ago

It seems that you're using old version of the dnn. Consider updating to the latest master, because it becomes faster, with low memory consumption and without some critical bugs

chrisrn commented 7 years ago

I use opencv-3.2 so I have the equivalent opencv-contrib-3.2 modules.

arrybn commented 7 years ago

Now in the dnn we remove Blob class, that's why I thought you're using previous version

chrisrn commented 7 years ago

So I need to build again opencv with the modules of master branch?

arrybn commented 7 years ago

Yes, you need to ckeckout the latest opencv and opencv_contrib master braches and after build

chrisrn commented 7 years ago

I did that but right now I am getting segmentation fault on net.forward command. Is this because I need to convert the image to NHWC before entering the network?

arrybn commented 7 years ago

Probably yes. It would be perfect if you share your python script and tensorflow model so I can reproduce this bug on my workstation. The code can be as compact as possible and you can post it here. Without the code it will be a random guessing

chrisrn commented 7 years ago

It's difficult to share it because I am handling specific avi files and I export the graph into pb file. But I uploaded this file yesterday so you can reproduce the error only from the tf_inception script.

arrybn commented 7 years ago

You can rewrite it to use image instead of video. Try to make the simpliest sample, which reproduces the problem. It will help to solve your problem much faster

chrisrn commented 7 years ago

You cannot run it because it's a script that predicts and you need my checkpoint file which is too big. Anyway, after the blobFromImage function I noticed that the inputBlob has shape (1, 1, 32, 32) so I reshaped it again to (1, 32, 32, 1) just like the blobFromImage function to enter the tensorflow model correctly, but I am getting memory corruption error on copyTo function!

arrybn commented 7 years ago

dnn accepts (1, 1, 32, 32) blob, but for tensorflow it should be (1, 32, 32, 1)

chrisrn commented 7 years ago

Yes sure. But if you check the pbtxt file the input has shape (1, 32, 32, 1). I think that there is not a mistake according to shape, because the image is transformed correctly into (1, 1, 32, 32) from (32, 32) by the blobFromImage function. The seg fault on the net.forward probably sources from something else.

arrybn commented 7 years ago

Could you post console output? Does it contain some layer names, types and another useful information?

chrisrn commented 7 years ago

Nop. It compiles well but when running the exe it produces Segmentation fault (core dumped). I know that it's difficult to understand but thank you very much for your help. The initial problem was that I did not have good predicitions by running the tf_inception script for my model and when I changed to the latest version of opencv-contrib I am getting the seg fault.

arrybn commented 7 years ago

Could you modify the dnn's code a little bit? If so, insert this line: std::cout << ld.name << " " << ld.type << std::endl; in the beginning of void forwardLayer(LayerData &ld) function ( modules/dnn/src/dnn.cpp, line 1016) Save and rebuild It prints you name and type of layer. The last printed entry will be for layer with segfault

chrisrn commented 7 years ago

Thank you for this tip. The console output is:

_input __NetInputLayer__
CifarNet/conv1/convolution Convolution

It is complaining for the 1st layer of the network, after the weights initialization.

arrybn commented 7 years ago

I think the only way to solve the problem is to debug. You should either do it by yourself or send me a file with the model. Also you can save the model in binary format instead of textual. It reduces size of a file a lot. Also you can initialize the model with random weights, if you don't want to share trained ones. Or you can remove all layers except this convolution

chrisrn commented 7 years ago

I uploaded the right_graph_gray1.txt file which contains the graph. I cannot upload the binary file here I don't know why. But I think you can convert this file into .pb file by the following snippet:

import tensorflow as tf
from tensorflow.python.platform import gfile
from google.protobuf import text_format

filename = '/path/to/right_graph_gray1.pbtxt'
with gfile.FastGFile(filename, 'r') as f:
    graph_def = tf.GraphDef()
    text_format.Merge(f.read(), graph_def)
    tf.import_graph_def(graph_def, name='')
    tf.train.write_graph(graph_def, '', 'deploy.pb', as_text=False)
chrisrn commented 7 years ago

I also tested the tf_inception script on the original tensorflow model and the segmentation fault still exists with console output:

_input __NetInputLayer__
conv2d0_pre_relu/conv Convolution
chrisrn commented 7 years ago

I just moved to the newest version of opencv and I don't have the seg fault. But the predictions are still wrong for my model. I think you have to explain more precisely in the documentation how can we load our own models, because a lot of mistakes are happening. Thanks for the whole help in this issue!

dkurt commented 7 years ago

@chrisrn, I've tested the last one of your networks.

output of TensorFlow: [[  9.99834776e-01   1.65201331e-04]]
output of DNN:        [[  9.99834776e-01   1.65201651e-04]]

Code:

import numpy as np
import tensorflow as tf
import cv2 as cv

# Read frozen model.
with tf.gfile.FastGFile('deploy.txt') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    for node in graph_def.node:
        print node.op, node.name

with tf.Session() as sess:
    # Restore session
    sess.graph.as_default()
    tf.import_graph_def(graph_def, name='')

    # Generate input
    np.random.seed(2701)
    inp = np.random.standard_normal([1, 32, 32, 1]).astype(np.float32)

    # Receive output
    outTensor = sess.graph.get_tensor_by_name('softmax2:0')
    out = sess.run(outTensor, feed_dict={'input:0': inp})
    print out

def NHWCtoNCHW(data):
    return data.transpose(0, 3, 1, 2)

# Load network
net = cv.dnn.readNetFromTensorflow('deploy.txt')
# Set input in appropriate data format
net.setInput(NHWCtoNCHW(inp))
# Receive output
cvOut = net.forward()

print cvOut
chrisrn commented 7 years ago

Thank you very much I already solved that. The problem did not have to do with the preprocessing, so finally the dnn module works fine! Thanks for your interest!

dkurt commented 7 years ago

The problems seems to me resolved. To highlight steps how to use models trained in TensorFlow:

final_network.pb might be successfully imported in DNN.

See tools @ https://github.com/tensorflow/tensorflow/tree/master/tensorflow/python/tools.

littlesun96 commented 7 years ago

Hello, @dkurt ! I'm working with model with LSTM, trained in Keras using Tensorflow backend, and I also have the issue "OpenCV Error: Unspecified error (Unknown layer type StridedSlice in op lstm_1/strided_slice) I used scripts freeze_graph.py and optimize_for_inference.py, as written before, but nothing had changed. Maybe you can help?

dkurt commented 7 years ago

@littlesun96, Hi! We can help you faster with some piece of code. Tensorflow usually has several implementations of layers/nodes (i.e. at tf.nn, tf.layers, tf.contrib.layers, tf.contrib.keras.layers). Please show the way you've added LSTM node.

littlesun96 commented 7 years ago

I don't really know, what Keras use inside

from keras.layers import Input, LSTM, Dense
from keras.models import Model

main_input = Input(shape=(seq_len,3), name='main_input')
lstm_out = LSTM(20, return_sequences=False, input_shape=(seq_len,3))(main_input)
label_out = Dense(1, activation = 'sigmoid', name = 'label_output')(lstm_out)
model = Model(inputs = [main_input], outputs = [label_out])
dkurt commented 7 years ago

@littlesun96, We've started working on it. I think, you'd better create a single issue @ https://github.com/opencv/opencv/issues and assign it to me.