tensorflow / tpu

Reference models and tools for Cloud TPUs.
https://cloud.google.com/tpu/
Apache License 2.0
5.21k stars 1.77k forks source link

Restore model occurs error: "KeyError: 'ShardDataset'" #418

Open billtiger opened 5 years ago

billtiger commented 5 years ago

I want to finetune the efficientnet, but when i restore the model with code: tf.train.import_metagraph("model.ckpt.meta") it occurs error: Traceback (most recent call last): File "E:/模型/efficientnet/efficientnet-b0/efficientnet-b0/restore.py", line 41, in = tf.train.import_meta_graph("model.ckpt.meta") File "C:\Users\billtiger\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\saver.py", line 1435, in import_meta_graph meta_graph_or_file, clear_devices, import_scope, kwargs)[0] File "C:\Users\billtiger\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\training\saver.py", line 1457, in _import_meta_graph_with_return_elements kwargs)) File "C:\Users\billtiger\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\meta_graph.py", line 806, in import_scoped_meta_graph_with_return_elements return_elements=return_elements) File "C:\Users\billtiger\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func return func(*args, **kwargs) File "C:\Users\billtiger\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\importer.py", line 399, in import_graph_def _RemoveDefaultAttrs(op_dict, producer_op_list, graph_def) File "C:\Users\billtiger\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\importer.py", line 159, in _RemoveDefaultAttrs op_def = op_dict[node.op] KeyError: 'ShardDataset' does anyone know how to restore the model?or how to get the model structure?

tilmto commented 5 years ago

I have encountered the same problem.

mingxingtan commented 5 years ago

Can you paste your complete code for restoring the model?

tiancheng2000 commented 4 years ago

I met with the same problem. The development env is stable (tf 1.12), and the code is just as normally used:

with tf.Session() as sess:
    ckpt = tf.train.latest_checkpoint("/tmp/efficientnet-b5")
    saver = tf.train.import_meta_graph(ckpt + ".meta")  # error occurred here
    #  saver.restore(sess, ckpt)

I wonder if it's the problem of meta file integrity. 'Cause I can run eval_ckpt_main.py successfully.

2019/06/17  10:16                77 checkpoint
2019/06/17  10:14       487,618,440 model.ckpt.data-00000-of-00001
2019/06/17  10:14           113,527 model.ckpt.index
2019/06/17  10:14       183,785,802 model.ckpt.meta
tiancheng2000 commented 4 years ago

Amend: I have successfully loaded CKPT (for both b7 and b5) by using TF 2.0's tensorflow.compat.v1. Source is similar to above, except you need to initialize variables after import_meta_graph()

with tf.Session() as sess:    
    ckpt = tf.train.latest_checkpoint("/tmp/efficientnet-b5")
    saver = tf.train.import_meta_graph(ckpt + ".meta")  # error disappeared
    sess.run(tf.global_variables_initializer())
    saver.restore(sess, ckpt)

Notice if you restore variables without calling initializers, InvalidArgumentError will be thrown showing that "No OpKernel was registered to support Op InfeedEnqueueTuple".