Closed wpq3142 closed 7 years ago
File format is inconsistent,Look at posts: http://votec.top/2016/12/24/tensorflow-r12-tf-train-Saver/
slim.get_or_create_global_step() change to: tf.train.get_or_create_global_step()
@wpq3142 this exception raised at here:
ckpt_reader = tf.train.NewCheckpointReader(checkpoint_path)
I don't dive into the implementation of this API, but I suppose this API is for new format.
I'm assuming the model code here would need to be updated to maybe determine which format the checkpoint is written in, and if so, use the correct API? If so, that sounds like a straightforward change and we'd welcome contributions helping to clean up the model.
@wpq3142 Can you tell us how you are configuring this particular entry in the config:
fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt"
.
It should look like
fine_tune_checkpoint: "/home/wpq/data/potato/data/model.ckpt"
Moreover, it also looks like you are using rfcn_resnet101_coco.config
with a faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017
checkpoint. These two are not compatible. You need use rfcn_resnet101_coco_11_06_2017.tar.gz with the rfcn_resnet101_coco.config
@tombstone
I downloaded the latest model,It's working right now,Configuration is as follows: --clone_on_cpu true --logtostderr --pipeline_config_path /home/wpq/data/potato/model/faster_rcnn_nas_coco.config --train_dir /home/wpq/data/potato/model/train
For one reason, I seem to lack a space between keys and values,
you just need to restore (.ckpt) not (.ckpt.meta) something like this :+1: sess = tf.Session() saver.restore(sess, 'mymodel/model100-500-0.998.ckpt')
Apparently in V2 checkpoints, you should only include the filename up to ".ckpt". For instance if the checkpoint filename is model.ckpt.data-00000-of-00001
then you should only use model.ckpt
. Using the full filename leads to getting a DataLossError.
Apparently in V2 checkpoints, you should only include the filename up to ".ckpt". For instance if the checkpoint filename is
model.ckpt.data-00000-of-00001
then you should only usemodel.ckpt
. Using the full filename leads to getting a DataLossError.
@pbashivan thank you so much
I have fixed the issue by this:
replace model.ckpt
the model.ckpt-200000
where 20000 is your checkpoint
number
Solved on #7696
Hello all, just follow the below video and export your own model with in a 10 seconds
Apparently in V2 checkpoints, you should only include the filename up to ".ckpt". For instance if the checkpoint filename is
model.ckpt.data-00000-of-00001
then you should only usemodel.ckpt
. Using the full filename leads to getting a DataLossError.
This works, and in my case, I used the longest common prefix among my check point related files which was model.ckpt-1000000
and it worked for me. I had the three following files in my folder:
model.ckpt-1000000.data-00000-of-00001
model.ckpt-1000000.index
model.ckpt-1000000.meta
I just thought this might be the case for some folks.
I was running into this and this worked for me. All I had to do was run the following on my windows 10 x64 machine and it worked:
python export_inference_graph.py --input_type image_tensor --pipeline_config_path ssd_mobilenet_v1_coco.config --trained_checkpoint_prefix models\model.ckpt-1000 --output_directory tuned_model
Instead of:
python export_inference_graph.py --input_type image_tensor --pipeline_config_path ssd_mobilenet_v1_coco.config --trained_checkpoint_prefix models\model.ckpt-1000.data-###-### --output_directory tuned_model
tl;dr Dont reference single files in the --trained_checkpoint_prefix flag. Just reference the batch (the prefix) of those three files.
Hope it helps.
@phosseini is correct. The model itself is made up of three different files with three different extensions showing what kind of model data each file stores.
For me too, using the longest shared file name prefix solved the issue.
model.ckpt-1000000.data-00000-of-00001
model.ckpt-1000000.index
model.ckpt-1000000.meta
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file ./model_dir/model.ckpt-1000000.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
I am trying to run opened project properly, the code saved files as model-10.data-0000-of-0001, .index, .meta. and The part in code to save files is described as below:
saver = tf.train.Saver(max_to_keep=50)
if self.pretrained_model is not None:
print("Start training with pretrained Model..")
saver.restore(sess, self.pretrained_model)
if (e + 1) % self.save_every == 0:
saver.save(sess, self.model_path + 'model', global_step=e + 1)
print("model-%s saved." % (e + 1))
One of solution in this issue is to change the file name.
model.ckpt-1000000.data-00000-of-00001 model.ckpt-1000000.index model.ckpt-1000000.meta
How to touch the code in my situation? How to change the file name? It looks the save method determine file name automatically. Or should i change the file name manually?
/////////////////////////////////////////////////////////////////////////////////////////////
It can be
if (e + 1) % self.save_every == 0:
saver.save(sess, self.model_path + 'model.ckpt', global_step=e + 1)
print("model-%s saved." % (e + 1))
but not enough
saver.restore(sess, self.model_path + cur_model2)
cur_model is 'model.ckpt-50.data-0000-of-0001', .index, .meta.
cur_model2 = cur_model[0:cur_model.find('-') + cur_model[cur_model.find('-'):].find('.')]
saver.restore(sess, self.model_path + cur_model2)
Just include file name in restore.
cur_model2 is 'model.ckpt-50'
none of the above worked. model.ckpt-1000000 model.ckpt-1000000.index model.ckpt-1000000.meta solved this problem for me..
Apparently in V2 checkpoints, you should only include the filename up to ".ckpt". For instance if the checkpoint filename is
model.ckpt.data-00000-of-00001
then you should only usemodel.ckpt
. Using the full filename leads to getting a DataLossError.
you are a legend
in some models, it could also be caused by lacking a .meta file and / or a .index file.
Please all, After I trained the tensrflow session , I do not have the name of files as .ckpt.data model.ckpt-1000000.data-00000-of-00001 model.ckpt-1000000.index model.ckpt-1000000.meta but instead Pretrained.data-00000-of-00001 Pretrained.index Pretrained.meta what should I do to solve the above problem of Data loss with my these saved files ??
none of the above worked. model.ckpt-1000000 model.ckpt-1000000.index model.ckpt-1000000.meta solved this problem for me..
@Rajput245 I have the same problem. Were you able to fix it?
Hi guys, I don't know if it is still a problem for you, but I had the following files: model.ckpt-100000.data-00000-of-00001 model.ckpt-100000.index model.ckpt-100000.meta
When I used the following code:
import tensorflow.compat.v1 as tf
import tf_slim as slim
checkpoint_path = absolute_path_to/model.ckpt-100000
init_fn = slim.assign_from_checkpoint_fn(
checkpoint_path, slim.get_model_variables(model_variables))
sess = tf.Session()
init_fn(sess)
I hope this helps you!
In my situation I don't have "ckpt" at all.
I just have the following 2 files:
What do I do?
I would maybe try to just add the ckpt after 'variables'.
I just resolved this issue. I saved the model as a .h5 file and that worked.
import tensorflow as tf from tensorflow.python.training import checkpoint_utils as cp print(cp.list_variables('path/model_name.ckpt'))
System information
Describe the problem
download the new :faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017.tar.gz
rfcn_resnet101_coco.config : model { faster_rcnn { num_classes: 37 image_resizer { keep_aspect_ratio_resizer { min_dimension: 600 max_dimension: 1024 } } feature_extractor { type: 'faster_rcnn_inception_resnet_v2' first_stage_features_stride: 8 }
Source code / logs
2017-11-01 15:11:40.186072: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open /home/wpq/data/potato/data/model.ckpt.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator? Traceback (most recent call last): File "/home/wpq/workspace/models-master/research/object_detection/train.py", line 163, in
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/wpq/workspace/models-master/research/object_detection/train.py", line 159, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/home/wpq/workspace/models-master/research/object_detection/trainer.py", line 254, in train
var_map, train_config.fine_tune_checkpoint))
File "/home/wpq/workspace/models-master/research/object_detection/utils/variables_helper.py", line 122, in get_variables_available_in_checkpoint
ckpt_reader = tf.train.NewCheckpointReader(checkpoint_path)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 150, in NewCheckpointReader
return CheckpointReader(compat.as_bytes(filepattern), status)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file /home/wpq/data/potato/data/model.ckpt.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?
Process finished with exit code 1