Closed mhaghighat closed 7 years ago
Hi, @mhaghighat! This is a problem due to there are some util layers in tensorflow graphs (like DecodeJpeg layer to load jpeg images). To launch tensorflow model in the dnn module you need firstly to remove all util layers from the model, To do this, you can use this script: freeze_graph.py (and you need to install python and tensorflow). This script removes all things which are irrelevant to inference and saves cleared model, so you need to launch this script only once. The script has next launch parameters which you should specify: --input_graph=/hdd/tf_models/alexnet_full/conv_graph.pb (input .pb file with graph description) --input_checkpoint=/hdd/tf_models/alexnet_full/conv_checkpoint.ckpt (input .ckpt ckeckpoint file with weights) --output_graph=/hdd/tf_models/alexnet_full/conv_graph_frozen.pb (path to output cleared graph) --output_node_names=prob (name of the layer with output you're interested in) --input_binary (specify input files format) Please let me know If you have any problems with this script. I'll consider adding feature of removing util layers when loading tf models in dnn module
My post above is for case when you've trained your own model tensorflow and have .pb and .ckpt files. Regarding to your case, I've checked once more and model from here: https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip works as is in dnn module. Please check that you're using this model. It's possible that some layers (specific) are missed in current release of dnn module. DecodeJpeg layer is util layer, it can be replaced by Identity layer when parsing and I'll try to do it soon. I'll let you know if I suggest solution for this issue.
Hi @arrybn, Thank you for your reply. I am using the model in your provided link (the '_tensorflow_inceptiongraph.pb' frozen model), and I still have that issue. How do you suggest that I should replace the util layers, like DecodeJpeg, with Identity? Won't it affect the result?
Hi, @mhaghighat. I've checked once more time: downloaded the archive from here: https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip, unzipped it and used tensorflow_inception_graph.pb model file. And c++ sample tf_inception.cpp works correctly. Also I've checked python interfaces - also no problems, model has been loaded and output has been computed. Please check launch parameters for the sample, maybe it uses previously downloaded model from other source. As I've found, there are 2 types of inception model distributed by Google - one with DecodeJpeg and other layers like subtract mean, normalize and so on (call them preprocessed layers because they don't have learnable parameters and do conversion of some output to form which appropriate for the model); and another - without all preprocessing layers. If you remove preprocessing layers and do all removed operations by yourself all will stay correct. In case of our sample tf_inception.cpp we manually decode images from jpeg by cv::imread() and can do any preprocessing. To sum up, please double check paths (maybe remove all inception model you've already downloaded, and download new from the provided link) because all works fine for me
Hello, @mhaghighat. Are you still experiencing described issue?
Hello, I have encountered the same issue as described here. We are using the inception model, retrained on our images as shown in this tutorial: https://www.tensorflow.org/tutorials/image_retraining. The suggested freeze_graph.py script did not solve the problem. It would be useful if the dnn module could handle models created this way as the official Tensorflow tutorials will surely be an entry point for many programmers.
@arrybn I have 2 questions:
1) The freeze_graph script removes the processing nodes or we need to do it manually? I ran this script for my graph and I am getting Unknown layer type which is from preprocessing I think. 2) I tried to load the inception5h model from your link and the setBlob function cannot recognize the input name (Requested blob 'input' not found). I saw that the input node from this model is called 'input' and the output node is called 'output'. You fed different values to i_blob and o_blob?
OS: KUbuntu 14.04
@chrisrn
Thank you it worked, I could just use the default values! Ok so I will try to remove the preprocessing layers and I will inform you if it works for my model.
@arrybn I don't think that the freeze_graph script removes layers. In my case I used a specific way of reading tensorflow images from TFR format using queues. The 'FIFOQueueV2' layer is not removed after running freeze_graph. And I think it's not a good solution to change the way of reading and restart training because when you have to do with big data, tensorflow proposes this way of reading. One solution could be to export another graph during the prediction without having training variables.
It is working now on my model with the above solution
@chrisrn I think output blob has name which differs from "output". You can get all blobs names by invoking Net::getLayerNames(), and it's highly likely that the last name from the returned vector is for output blob
@arrybn You can run it if you export the graph from a simplified python predictor without the preprocessing layers. But I see some differences when I predict using the tf_inception script (for my model) and I cannot understand why. The steps are the following:
1) Export .pbtxt from python predictor. 2) Feed the weights using freeze_graph script from tensorflow and export .pb file. 3) Predict by feeding the .pb file to the tf_inception script.
@chrisrn could you share with me all necessary files: .pbtxt, .pb ? And what is python predictor?
By "python predictor" I mean that you don't need to export the graph during training. You can create a tensorflow script only for prediction without the preprocessing layers. Tensorflow has to do with more layers during training that's why I mentioned that. right_graph_gray1.txt I can only upload the pbtxt file
To make things clear:
Everything right. The only difference is at step 1 where I am exporting the pbtxt file from another script in which the preprocessing layers do not exist. Inputs are the same that's not the problem. I am handling .avi video file using opencv on both python and c++ prediction.
But tensorflow accepts NHWC-ordered blobs, and dnn's blobFromImage() returns NCHW
So you propose the conversion to NHCW?
I just want to put your attention to this fact and to my mind it can be the issue. Could you send me your .pb file which you're loading to dnn?
Suppose you have frame from OpenCV's VideoCapture. This is a 3-channels BGR numpy.ndarray with HWC dimensions order. Tensorflow accepts 4-dimensional blobs and you can add one dimension to the input image (to make it 4-dimensional NHWC) and pass it to tensorflow. But dnn works with NCHW. So you need to reorder dimansions. You can do it by numpy.transpose() function
.pb format is not supported that's why it's txt. You can convert it back to its normal .pb format.
I think that if I reorder the dimensions of the image that enters the dnn::Blob::fromImages function it can work.
It seems that you're using old version of the dnn. Consider updating to the latest master, because it becomes faster, with low memory consumption and without some critical bugs
I use opencv-3.2 so I have the equivalent opencv-contrib-3.2 modules.
Now in the dnn we remove Blob class, that's why I thought you're using previous version
So I need to build again opencv with the modules of master branch?
Yes, you need to ckeckout the latest opencv and opencv_contrib master braches and after build
I did that but right now I am getting segmentation fault on net.forward command. Is this because I need to convert the image to NHWC before entering the network?
Probably yes. It would be perfect if you share your python script and tensorflow model so I can reproduce this bug on my workstation. The code can be as compact as possible and you can post it here. Without the code it will be a random guessing
It's difficult to share it because I am handling specific avi files and I export the graph into pb file. But I uploaded this file yesterday so you can reproduce the error only from the tf_inception script.
You can rewrite it to use image instead of video. Try to make the simpliest sample, which reproduces the problem. It will help to solve your problem much faster
You cannot run it because it's a script that predicts and you need my checkpoint file which is too big. Anyway, after the blobFromImage function I noticed that the inputBlob has shape (1, 1, 32, 32) so I reshaped it again to (1, 32, 32, 1) just like the blobFromImage function to enter the tensorflow model correctly, but I am getting memory corruption error on copyTo function!
dnn accepts (1, 1, 32, 32) blob, but for tensorflow it should be (1, 32, 32, 1)
Yes sure. But if you check the pbtxt file the input has shape (1, 32, 32, 1). I think that there is not a mistake according to shape, because the image is transformed correctly into (1, 1, 32, 32) from (32, 32) by the blobFromImage function. The seg fault on the net.forward probably sources from something else.
Could you post console output? Does it contain some layer names, types and another useful information?
Nop. It compiles well but when running the exe it produces Segmentation fault (core dumped). I know that it's difficult to understand but thank you very much for your help. The initial problem was that I did not have good predicitions by running the tf_inception script for my model and when I changed to the latest version of opencv-contrib I am getting the seg fault.
Could you modify the dnn's code a little bit? If so, insert this line:
std::cout << ld.name << " " << ld.type << std::endl;
in the beginning of void forwardLayer(LayerData &ld) function ( modules/dnn/src/dnn.cpp, line 1016)
Save and rebuild
It prints you name and type of layer. The last printed entry will be for layer with segfault
Thank you for this tip. The console output is:
_input __NetInputLayer__
CifarNet/conv1/convolution Convolution
It is complaining for the 1st layer of the network, after the weights initialization.
I think the only way to solve the problem is to debug. You should either do it by yourself or send me a file with the model. Also you can save the model in binary format instead of textual. It reduces size of a file a lot. Also you can initialize the model with random weights, if you don't want to share trained ones. Or you can remove all layers except this convolution
I uploaded the right_graph_gray1.txt
file which contains the graph. I cannot upload the binary file here I don't know why. But I think you can convert this file into .pb file by the following snippet:
import tensorflow as tf
from tensorflow.python.platform import gfile
from google.protobuf import text_format
filename = '/path/to/right_graph_gray1.pbtxt'
with gfile.FastGFile(filename, 'r') as f:
graph_def = tf.GraphDef()
text_format.Merge(f.read(), graph_def)
tf.import_graph_def(graph_def, name='')
tf.train.write_graph(graph_def, '', 'deploy.pb', as_text=False)
I also tested the tf_inception script on the original tensorflow model and the segmentation fault still exists with console output:
_input __NetInputLayer__
conv2d0_pre_relu/conv Convolution
I just moved to the newest version of opencv and I don't have the seg fault. But the predictions are still wrong for my model. I think you have to explain more precisely in the documentation how can we load our own models, because a lot of mistakes are happening. Thanks for the whole help in this issue!
@chrisrn, I've tested the last one of your networks.
output of TensorFlow: [[ 9.99834776e-01 1.65201331e-04]]
output of DNN: [[ 9.99834776e-01 1.65201651e-04]]
Code:
import numpy as np
import tensorflow as tf
import cv2 as cv
# Read frozen model.
with tf.gfile.FastGFile('deploy.txt') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
for node in graph_def.node:
print node.op, node.name
with tf.Session() as sess:
# Restore session
sess.graph.as_default()
tf.import_graph_def(graph_def, name='')
# Generate input
np.random.seed(2701)
inp = np.random.standard_normal([1, 32, 32, 1]).astype(np.float32)
# Receive output
outTensor = sess.graph.get_tensor_by_name('softmax2:0')
out = sess.run(outTensor, feed_dict={'input:0': inp})
print out
def NHWCtoNCHW(data):
return data.transpose(0, 3, 1, 2)
# Load network
net = cv.dnn.readNetFromTensorflow('deploy.txt')
# Set input in appropriate data format
net.setInput(NHWCtoNCHW(inp))
# Receive output
cvOut = net.forward()
print cvOut
Thank you very much I already solved that. The problem did not have to do with the preprocessing, so finally the dnn module works fine! Thanks for your interest!
The problems seems to me resolved. To highlight steps how to use models trained in TensorFlow:
Be sure that graph has at least one variable (trainable parameter).
Save session.
saver = tf.train.Saver()
saver.save(sess, 'model.ckpt')
You'll receive something like
checkpoint
model.ckpt.data-00000-of-00001
model.ckpt.index
model.ckpt.meta
Save graph in textual format.
tf.train.write_graph(sess.graph.as_graph_def(), "", "graph.pbtxt")
Freeze graph. Because text representation doesn't contain weights, we need to merge "graph.pbtxt" with checkpoint.
python freeze_graph.py
--input_graph graph.pbtxt
--input_checkpoint model.ckpt
--output_graph frozen_graph.pb
--output_node_names "name_of_output_op_here"
Optimize for inference. This tool removes all training nodes from graph.
python optimize_for_inference.py
--input frozen_graph.pb
--output final_network.pb
--frozen_graph True
--input_names "name_of_input_op_here"
--output_names "name_of_output_op_here"
final_network.pb
might be successfully imported in DNN.
See tools @ https://github.com/tensorflow/tensorflow/tree/master/tensorflow/python/tools.
Hello, @dkurt ! I'm working with model with LSTM, trained in Keras using Tensorflow backend, and I also have the issue "OpenCV Error: Unspecified error (Unknown layer type StridedSlice in op lstm_1/strided_slice) I used scripts freeze_graph.py and optimize_for_inference.py, as written before, but nothing had changed. Maybe you can help?
@littlesun96, Hi! We can help you faster with some piece of code. Tensorflow usually has several implementations of layers/nodes (i.e. at tf.nn
, tf.layers
, tf.contrib.layers
, tf.contrib.keras.layers
). Please show the way you've added LSTM node.
I don't really know, what Keras use inside
from keras.layers import Input, LSTM, Dense
from keras.models import Model
main_input = Input(shape=(seq_len,3), name='main_input')
lstm_out = LSTM(20, return_sequences=False, input_shape=(seq_len,3))(main_input)
label_out = Dense(1, activation = 'sigmoid', name = 'label_output')(lstm_out)
model = Model(inputs = [main_input], outputs = [label_out])
@littlesun96, We've started working on it. I think, you'd better create a single issue @ https://github.com/opencv/opencv/issues and assign it to me.
System information (version)
Detailed description
I am trying to run the tf_inception.cpp example. The compilation is successful, but it gives a run-time error. The error happens at line 83 calling:
importer->populateNet(net);
populateNet
goes through all layers in a for loop at line 508 of the tf_importer.cpp. However, both name and type of the layer # 1 of the tensorflow_inception_graph.pb (whenli = 1
), is DecodeJpeg, which does not match any case in the loop and results in the error in line 729: