tryolabs / luminoth

Deep Learning toolkit for Computer Vision.
https://tryolabs.com
BSD 3-Clause "New" or "Revised" License
2.4k stars 401 forks source link

Loading two checkpoints gives NotFoundError #268

Open munikarmanish opened 5 years ago

munikarmanish commented 5 years ago

I tried loading two checkpoints as follows. However, loading only one (either) works fine.

In [1]:  from luminoth import Detector

In [2]:  model1 = Detector('checkpoint1')
2019-02-15 05:11:39.359988: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
INFO:tensorflow:Restoring parameters from /root/.luminoth/checkpoints/bab8dccb2202/model.ckpt-140716
INFO:tensorflow:Loaded checkpoint.

In [3]:  model2 = Detector('checkpoint2')
INFO:tensorflow:Restoring parameters from /root/.luminoth/checkpoints/7207a39c0441/model.ckpt-17296
2019-02-15 05:12:02.990421: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key fasterrcnn_1/rcnn/fc_bbox/b not found in checkpoint
Traceback (most recent call last):
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key fasterrcnn_1/rcnn/fc_bbox/b not found in checkpoint
     [[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1546, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key fasterrcnn_1/rcnn/fc_bbox/b not found in checkpoint
     [[node save/RestoreV2 (defined at /root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/utils/predicting.py:61)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
  File "<stdin>", line 1, in <module>
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/tasks.py", line 71, in __init__
    self._network = PredictorNetwork(config)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/utils/predicting.py", line 61, in __init__
    saver = tf.train.Saver(sharded=True, allow_empty=True)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1102, in __init__
    self.build()
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1114, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1151, in _build
    build_save=build_save, build_restore=build_restore)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 789, in _build_internal
    restore_sequentially, reshape)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 459, in _AddShardedRestoreOps
    name="restore_shard"))
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps
    restore_sequentially)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 862, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Key fasterrcnn_1/rcnn/fc_bbox/b not found in checkpoint
     [[node save/RestoreV2 (defined at /root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/utils/predicting.py:61)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1556, in restore
    names_to_keys = object_graph_key_mapping(save_path)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1830, in object_graph_key_mapping
    checkpointable.OBJECT_GRAPH_PROTO_KEY)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 371, in get_tensor
    status)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/tasks.py", line 71, in __init__
    self._network = PredictorNetwork(config)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/utils/predicting.py", line 62, in __init__
    saver.restore(self.session, ckpt)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1562, in restore
    err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key fasterrcnn_1/rcnn/fc_bbox/b not found in checkpoint
     [[node save/RestoreV2 (defined at /root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/utils/predicting.py:61)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
  File "<stdin>", line 1, in <module>
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/tasks.py", line 71, in __init__
    self._network = PredictorNetwork(config)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/utils/predicting.py", line 61, in __init__
    saver = tf.train.Saver(sharded=True, allow_empty=True)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1102, in __init__
    self.build()
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1114, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1151, in _build
    build_save=build_save, build_restore=build_restore)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 789, in _build_internal
    restore_sequentially, reshape)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 459, in _AddShardedRestoreOps
    name="restore_shard"))
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps
    restore_sequentially)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 862, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
    shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/root/.virtualenvs/test/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key fasterrcnn_1/rcnn/fc_bbox/b not found in checkpoint
     [[node save/RestoreV2 (defined at /root/.virtualenvs/test/lib/python3.5/site-packages/luminoth/utils/predicting.py:61)  = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

System:

Basically, I need to load multiple checkpoints in memory and use one of them based on a parameter provided by the user.

dshea89 commented 5 years ago

You need to use separate Tensorflow graphs when loading and using each model. If you are using Keras, you also need to use separate sessions. See:

import tensorflow as tf

graph1 = tf.Graph()
with graph1.as_default():
    session1 = tf.Session()
    with session1.as_default():
        model1 = Detector('checkpoint1')

graph2 = tf.Graph()
with graph2.as_default():
    session2 = tf.Session()
    with session2.as_default():
        model2 = Detector('checkpoint2')

with graph1.as_default():
    with session1.as_default():
        model1.predict(img)

with graph2.as_default():
    with session2.as_default():
        model2.predict(img)

Reference: https://stackoverflow.com/a/51290092