Checkpoint file - Githubissues

chrisrn commented 6 years ago

Can you provide the checkpoint folder (including meta file)? It is common now in tensorflow to import meta graphs.

kli017 commented 6 years ago

can you share the checkpoint folder? Thank you

kli017 commented 6 years ago

@chrisrn Do you have the checkpoint folder already?

chrisrn commented 6 years ago

Yes but it contains a more complex graph. But I can give you the code for converting a protobuf file into checkpoint. Inside a protobuf file all variables are converted to constants. So you can import the graph from protobuf, convert all constants to variables and export a checkpoint like that:

def protobuf_to_checkpoint_conversion(pb_model, ckpt_dir):

    graph = tf.Graph()
    with graph.as_default():
        od_graph_def = tf.GraphDef()
        with tf.gfile.GFile(pb_model, 'rb') as fid:
            serialized_graph = fid.read()
            od_graph_def.ParseFromString(serialized_graph)
            tf.import_graph_def(od_graph_def, name='')

    with graph.as_default():
        config = tf.ConfigProto()

        with tf.Session(graph=graph, config=config) as sess:

            constant_ops = [op for op in sess.graph.get_operations() if op.type == "Const"]
            params = []
            for constant_op in constant_ops:
                shape = constant_op.outputs[0].get_shape()
                var = tf.get_variable(constant_op.name, shape=shape)
                params.append(var)

            init = tf.global_variables_initializer()
            sess.run(init)

            saver = tf.train.Saver(var_list=params)
            ckpt_path = os.path.join(ckpt_dir, 'model.ckpt')
            saver.save(sess, ckpt_path)

kli017 commented 6 years ago

@chrisrn Thanks a lot. The code works well!

Dongshengjiang commented 5 years ago

Thanks for the convert function, but when I fine-tuned from the ckpt with pipeline.config of ssd_mobilenet_v1_coco, tensorflow reports that there is no weight of (may tensors) in the fine-tuned ckpt. So can you attach your pipline.config?

yoyomolinas commented 5 years ago

@Dongshengjiang Have you got the pipeline.config file?

Dongshengjiang commented 5 years ago

Not yet

                        蒋

                                邮箱：dongshengjiang@aliyun.com

签名由网易邮箱大师定制

On 01/17/2019 02:46, Yoel Molinas wrote: @Dongshengjiang Have you got the pipeline.config file?

—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/yeephycho/tensorflow-face-detection","title":"yeephycho/tensorflow-face-detection","subtitle":"GitHub repository","main_image_url":"https://github.githubassets.com/images/email/message_cards/header.png","avatar_image_url":"https://github.githubassets.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/yeephycho/tensorflow-face-detection"}},"updates":{"snippets":[{"icon":"PERSON","message":"@yoyomolinas in #42: @Dongshengjiang Have you got the pipeline.config file? "}],"action":{"name":"View Issue","url":"https://github.com/yeephycho/tensorflow-face-detection/issues/42#issuecomment-454894170"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/yeephycho/tensorflow-face-detection/issues/42#issuecomment-454894170", "url": "https://github.com/yeephycho/tensorflow-face-detection/issues/42#issuecomment-454894170", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

yoyomolinas commented 5 years ago

@chrisrn Thanks for the conversion function. I realized that the conversion uses only a single graph to perform all loading and saving which causes new variables to have an extension of '_1' to their names. This causes several issues when attempting to load model from checkpoint files. I modified the function the following way to restore variables with the same names they were originally stored in the protobuf file.

def protobuf_to_checkpoint_conversion(pb_model, ckpt_dir):
    graph = tf.Graph()
    with graph.as_default():
        od_graph_def = tf.GraphDef()
        with tf.gfile.GFile(pb_model, 'rb') as fid:
            serialized_graph = fid.read()
            od_graph_def.ParseFromString(serialized_graph)
            tf.import_graph_def(od_graph_def,name='')

    graph2 = tf.Graph()
    with graph2.as_default():
        config = tf.ConfigProto()
        with tf.Session(graph=graph2, config=config) as sess:
            constant_ops = [op for op in graph.get_operations() if op.type == "Const"]
            params = []
            for constant_op in constant_ops:
                name = constant_op.name
                shape = constant_op.outputs[0].get_shape()
                var = tf.get_variable(name, shape=shape)
                params.append(var)

            init = tf.global_variables_initializer()
            sess.run(init)
            saver = tf.train.Saver(var_list=params)
            ckpt_path = os.path.join(ckpt_dir, 'model.ckpt')
            saver.save(sess, ckpt_path, global_step=1)

I am currently working on optimizing this face detector with TensorRT. I face some issues when exporting the model with object_detection.exporter.export_inference_graph from the object detection API. The error I specifically get when trying to export the frozen inference graph is this:

InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [6] rhs shape= [9] [[Node: save/Assign_2 = Assign[T=DT_FLOAT, _class=["loc:@BoxPredictor_0/ClassPredictor/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](BoxPredictor_0/ClassPredictor/biases, save/RestoreV2:2)]]

Inspection showed that this error is due to attempting to assign tensors with different shapes from the variables restored from the checkpoint to the pipeline.config generated model. I visualized the graphs on Tensorboard and realized that the BoxPredictor_x/ClassPredictor have output tensors with different shape in checkpoint and the config generated model. I suppose some special config parameters were used.

I would appreciate if anyone can share their insights on the issue, or the config file.

Thanks and best,

yoyomolinas commented 5 years ago

Solution

First of all, the conversion function posted above is incomplete; variables are not loaded with trained parameters. Here is the updated version of the conversion function to load trained params into variables.

def protobuf_to_checkpoint_conversion(pb_model, ckpt_dir):

    graph = tf.Graph()
    with graph.as_default():
        od_graph_def = tf.GraphDef()
        with tf.gfile.GFile(pb_model, 'rb') as fid:
            serialized_graph = fid.read()
            od_graph_def.ParseFromString(serialized_graph)
            tf.import_graph_def(od_graph_def,name='')

    image_tensor = graph.get_tensor_by_name('image_tensor:0')
    dummy = np.random.random((1, 512, 512, 3))

    with graph.as_default():
        config = tf.ConfigProto()
        with tf.Session(graph=graph, config=config) as sess:
            constant_ops = [op for op in graph.get_operations() if op.type == "Const"]
            vars_dict = {}
            ass = []
            for constant_op in constant_ops:
                name = constant_op.name
                const = constant_op.outputs[0]
                shape = const.shape
                var = tf.get_variable(name, shape, dtype=const.dtype, initializer=tf.zeros_initializer())
                vars_dict[name] = var

            print('INFO:Initializing variables')
            init = tf.global_variables_initializer()
            sess.run(init)

            print('INFO: Loading vars')
            for constant_op in tqdm(constant_ops):
                name = constant_op.name
                if 'FeatureExtractor' in name or 'BoxPredictor' in name:
                    const = constant_op.outputs[0]
                    shape = const.shape
                    var = vars_dict[name]
                    var.load(sess.run(const, feed_dict={image_tensor:dummy}), sess)

            saver = tf.train.Saver(var_list=vars_dict)
            ckpt_path = os.path.join(ckpt_dir, 'model.ckpt')
            saver.save(sess, ckpt_path)
    return graph, vars_dict

If variables are not loaded, randomly initialized variables will be restored.

Moreover, I solved the above issue by setting num_classes = 2 in pipeline.config file. For the object detection API this means that apart from the background class there are two more classes. This confuses me because the idea behind a binary object detector is that it has two classes, the object and the background class. Please provide some light into why num_classes is chosen to be 2 instead of 1.

I have the ckpt and config file now, reach out if you need it.

hsulin0806 commented 5 years ago

I have the ckpt and config file now, reach out if you need it.

I need it very much,thank you!

deimsdeutsch commented 5 years ago

@yoyomolinas Can you share the checkpoint and pipeline config file .. on google drive or dropbox.

Thanks

yoyomolinas commented 5 years ago

Here is the config file for all the people who requested. @hsulin0806 , @deimsdeutsch. pipeline.config.zip

fariagu commented 5 years ago

EDIT:

I'll leave this here in case anyone encouters the same problem.

It was complaining about there not being a key named "global_step", so I manually inserted one

import os
import tensorflow as tf
import numpy as np
from tqdm import tqdm

def protobuf_to_checkpoint_conversion(pb_model, ckpt_dir):

    graph = tf.Graph()
    with graph.as_default():
        od_graph_def = tf.GraphDef()
        with tf.gfile.GFile(pb_model, 'rb') as fid:
            serialized_graph = fid.read()
            od_graph_def.ParseFromString(serialized_graph)
            tf.import_graph_def(od_graph_def,name='')

    image_tensor = graph.get_tensor_by_name('image_tensor:0')
    dummy = np.random.random((1, 512, 512, 3))

    with graph.as_default():
        config = tf.ConfigProto()
        with tf.Session(graph=graph, config=config) as sess:
            constant_ops = [op for op in graph.get_operations() if op.type == "Const"]
            vars_dict = {}
            ass = []

            for constant_op in constant_ops:
                name = constant_op.name
                const = constant_op.outputs[0]
                shape = const.shape
                var = tf.get_variable(name, shape, dtype=const.dtype, initializer=tf.zeros_initializer())
                vars_dict[name] = var
                pass

            # desperate times
            vars_dict["global_step"] = tf.get_variable(
                "global_step",
                shape=shape,
                dtype=tf.int64,
                initializer=tf.zeros_initializer()
            )

            print('INFO:Initializing variables')
            init = tf.global_variables_initializer()
            sess.run(init)
            # load_step = 0
            # global_step = tf.Variable(load_step, name="global_step", dtype=tf.int64)
            # sess.run(global_step.initializer)

            print('INFO: Loading vars')
            for constant_op in tqdm(constant_ops):
                name = constant_op.name
                if 'FeatureExtractor' in name or 'BoxPredictor' in name:
                    const = constant_op.outputs[0]
                    shape = const.shape
                    var = vars_dict[name]
                    var.load(sess.run(const, feed_dict={image_tensor:dummy}), sess)

            saver = tf.train.Saver(var_list=vars_dict)
            ckpt_path = os.path.join(ckpt_dir, 'model.ckpt')
            saver.save(sess, ckpt_path)
    return graph, vars_dict

This is just @yoyomolinas' code where I also insert a new item in the dictionary vars_dict

@yoyomolinas I can successfully generate the model.ckpt files using your code, however when using that checkpoint to run

export_tflite_ssd_graph.py --pipeline_config_path=pathto/pipeline.config --trained_checkpoint_prefix=pathto/model.ckpt --output_directory=pathto/outdir --add_postprocessing_op=true

it fails claiming

Key global_step not found in checkpoint

Is it something to do with how the .ckpt files are generated?

The purpose of this would be to use the generated .pb file to convert into a tflite model

Here is the complete error log:

2019-04-29 15:25:52.163393: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key global_step not found in checkpoint
Traceback (most recent call last):
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key global_step not found in checkpoint
         [[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1546, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key global_step not found in checkpoint
         [[node save/RestoreV2 (defined at /home/gustavoduartefaria/models/research/object_detection/export_tflite_ssd_graph_lib.py:285)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
  File "export_tflite_ssd_graph.py", line 143, in <module>
    tf.app.run(main)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "export_tflite_ssd_graph.py", line 139, in main
    FLAGS.max_classes_per_detection, FLAGS.use_regular_nms)
  File "/home/gustavoduartefaria/models/research/object_detection/export_tflite_ssd_graph_lib.py", line 285, in export_tflite_graph
    saver = tf.train.Saver(**saver_kwargs)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1102, in __init__
    self.build()
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1114, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1151, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 795, in _build_internal
    restore_sequentially, reshape)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps
    restore_sequentially)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 862, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Key global_step not found in checkpoint
         [[node save/RestoreV2 (defined at /home/gustavoduartefaria/models/research/object_detection/export_tflite_ssd_graph_lib.py:285)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1556, in restore
    names_to_keys = object_graph_key_mapping(save_path)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1830, in object_graph_key_mapping
    checkpointable.OBJECT_GRAPH_PROTO_KEY)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 371, in get_tensor
    status)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "export_tflite_ssd_graph.py", line 143, in <module>
    tf.app.run(main)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "export_tflite_ssd_graph.py", line 139, in main
    FLAGS.max_classes_per_detection, FLAGS.use_regular_nms)
  File "/home/gustavoduartefaria/models/research/object_detection/export_tflite_ssd_graph_lib.py", line 299, in export_tflite_graph
    initializer_nodes='')
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/tools/freeze_graph.py", line 151, in freeze_graph_with_def_protos
    saver.restore(sess, input_checkpoint)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1562, in restore
    err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key global_step not found in checkpoint
         [[node save/RestoreV2 (defined at /home/gustavoduartefaria/models/research/object_detection/export_tflite_ssd_graph_lib.py:285)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
  File "export_tflite_ssd_graph.py", line 143, in <module>
    tf.app.run(main)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "export_tflite_ssd_graph.py", line 139, in main
    FLAGS.max_classes_per_detection, FLAGS.use_regular_nms)
  File "/home/gustavoduartefaria/models/research/object_detection/export_tflite_ssd_graph_lib.py", line 285, in export_tflite_graph
    saver = tf.train.Saver(**saver_kwargs)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1102, in __init__
    self.build()
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1114, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1151, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 795, in _build_internal
    restore_sequentially, reshape)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps
    restore_sequentially)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 862, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/gustavoduartefaria/gFaceRec/keras/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key global_step not found in checkpoint
         [[node save/RestoreV2 (defined at /home/gustavoduartefaria/models/research/object_detection/export_tflite_ssd_graph_lib.py:285)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Any help would be appreciated, thank you

yoyomolinas commented 5 years ago

@fariagu I had the same issue too. What I did was that I went into the export.py and found the line that generates the error and commented out that line. Apparently tensorflow is trying to find and restore the global_step variable which does not exist in the checkpoint file generated.

Of course, this is a temporary solution. If you find some better way to do this, let us know. Also, do you know what the global step variable does in a checkpoint file?

fariagu commented 5 years ago

@yoyomolinas from what I could gather the global-step variable is a sort of counter for when generating checkpoint files

If you were to call

saver.save(sess, 'model.ckpt', global_step=0)

if would append '-0' to the file name, now becoming model.ckpt-0

I can't say my solution is better but the code I pasted above when I edited my comment instanciates that same global_step variable and inserts it into the vars_dict dictionary, while not passing it to the function saver.save() so the generated filenames remain the same.

Thanks for replying 😄

sorny92 commented 5 years ago

@yoyomolinas As I read your comments you were trying to load this model in TensorRT. I'm trying the same thing right now. I've been able to generate .uff file but when I build the engine I get an error referring to the operation FILL which is not implemented in TensorRT engine.

[TRT] UffParser: Validator error: FeatureExtractor/MobilenetV1/zeros_6: Unsupported operation _Fill

I'm thinking 2 possibilities: to remove those operations because I don't really see why they are there or implement the FILL operation as a customPlugin in the TensorRT engine.

Do you have any insight related to this?

yoyomolinas commented 5 years ago

@sorny92 First of all before converting graph to uff, tensorflow object detection api has an exporter tool that prepares detection graphs for deployment. This process involves removing some unnecessary ops such as ASSERT ops and possibly the FILL op you described above. Check the link I provide below for an example.

Converting models to uff have strict rules. For example, if one of the tf layers is not supported by the UffParser then you have to go about creating a custom plugin for TensorRT. Creating a custom layer is an arduous process. Instead I used Tensorflow's TensorRT package to optimize a tf graph in TensorRT. This package skips the TF layers not implemented in TensorRT during optimization. Although this solution is less optimal than using the converteduff model in TensorRT, I still achieved better performance than pure TF.

Click for examples on optimizing different models using tensorrt package of tensorflow. These examples also show how to properly export a detection model.

If you are going about implementing custom plugins in TensorRT let me know, we can collaborate.

sorny92 commented 5 years ago

@yoyomolinas Oh yes, I tried that but it seems I compiled from sources my Tensorflow build with a different version of TensorRT. I will give it a try soon! If it doesn't work I might go for implementing the custom layer. This one doesn't seems to be that hard to implement as far I can see in the documentation it just fills a tensor with a value, I just don't get to see what's the point of this in inference so I might have to debug my graph too.

Thanks for your help, I will keep you informed if I get to implement it.

Varat7v2 commented 4 years ago

@yoyomolinas I used your code for checkpoint conversion. Its working pretty well but I am not able to use the exported frozen graph model to tensorrt uff model that is runnable on jetson-inference.

What might be the reason?

yeephycho / tensorflow-face-detection

Checkpoint file #42

Solution