Open floflif opened 3 years ago
I'm not sure but you can try changing to fine_tune_checkpoint_version: V2 (the default is V1) in the train_config since you are using v2_keras. I recommend looking into the protos when you are having trouble with the pipeline.config file.
Hello, thanks for your answer but I checked and there is no attribute fine_tune_checkpoint_version anywhere, nor in the pipeline.config, nor in the ssd_mobilenet_v2_quantized_300x300_coco.config
You have to add it. It is part of the proto's definition. https://github.com/tensorflow/models/blob/master/research/object_detection/protos/train.proto#L68
Hello so I have the following part in the config file :
fine_tune_checkpoint: "C:/tensorflow1/models/research/object_detection/ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03/model.ckpt" fine_tune_checkpoint_type: "detection" fine_tune_checkpoint_version: V2
Where I added the part that you mentionned and this seems not to train again with the same error :
Traceback (most recent call last):
File "model_main_tf2.py", line 115, in <module>
tf.compat.v1.app.run()
File "C:\Users\Flo\mob1\lib\site-packages\tensorflow\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\Users\Flo\mob1\lib\site-packages\absl\app.py", line 312, in run
_run_main(main, args)
File "C:\Users\Flo\mob1\lib\site-packages\absl\app.py", line 258, in _run_main
sys.exit(main(argv))
File "model_main_tf2.py", line 112, in main
record_summaries=FLAGS.record_summaries)
File "C:\tensorflow1\models\research\object_detection\model_lib_v2.py", line 603, in train_loop
train_input, unpad_groundtruth_tensors)
File "C:\tensorflow1\models\research\object_detection\model_lib_v2.py", line 389, in load_fine_tune_checkpoint
raise IOError('Checkpoint is expected to be an object-based checkpoint.')
OSError: Checkpoint is expected to be an object-based checkpoint.
Hi @Drisnor If you use a model that is compatible with TF v2, the error will be resolved. In the load_fine_tune_checkpoint, is_object_based_checkpoint is called to check the contents of checkpoint, but it becomes false in the v1 model.
In [1]: import tensorflow.compat.v1 as tf
In [2]: var_names = [var[0] for var in tf.train.list_variables('ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03/model.ckpt')]
In [3]: '_CHECKPOINTABLE_OBJECT_GRAPH' in var_names
Out[3]: False
in the case of v2 model
In [4]: var_names = [var[0] for var in tf.train.list_variables('ssd_mobilenet_v2_320x320_coco17_tpu-8/checkpoint/ckpt-0')]
In [5]: '_CHECKPOINTABLE_OBJECT_GRAPH' in var_names
Out[5]: True
I hope this is helpful.
@satojkovic I tried your recommendations but still facing the same error as reported by @Drisnor . I am struggling on this code since 2 days, could someone please help with below error.
Traceback (most recent call last):
File "model_main_tf2.py", line 115, in <module>
tf.compat.v1.app.run()
File "/Users/deepali/Documents/CV_Projects/Decarb_ObjectDetection/models/venv/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/Users/deepali/Documents/CV_Projects/Decarb_ObjectDetection/models/venv/lib/python3.7/site-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/Users/deepali/Documents/CV_Projects/Decarb_ObjectDetection/models/venv/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "model_main_tf2.py", line 112, in main
record_summaries=FLAGS.record_summaries)
File "/Users/deepali/Documents/CV_Projects/Decarb_ObjectDetection/models/venv/lib/python3.7/site-packages/object_detection/model_lib_v2.py", line 685, in train_loop
losses_dict = _dist_train_step(train_input_iter)
File "/Users/deepali/Documents/CV_Projects/Decarb_ObjectDetection/models/venv/lib/python3.7/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/Users/deepali/Documents/CV_Projects/Decarb_ObjectDetection/models/venv/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Unknown image file format. One of JPEG, PNG, GIF, BMP required.
[[{{node case/cond/cond_jpeg/decode_image/DecodeImage}}]]
[[MultiDeviceIteratorGetNextFromShard]]
[[RemoteCall]]
[[while/body/_1/IteratorGetNext]] [Op:__inference__dist_train_step_89441]
Function call stack:
_dist_train_step -> _dist_train_step
@deepali0162 How did you create the tfrecord file? Looking at the traceback log, it looks like the image format of tfrecord is not correct. I'd recommend checking the sanity of your data.
@satojkovic just figured out one of the image was causing the issue, thank you so much for your reply.
Hello, sorry for the inconvenience but I have currently the same issue. I'm using Tensorflow 2.5.0 with the right CUDA version, my model is ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03 and so I modified the associated config file : ssd_mobilenet_v2_quantized_300x300_coco.config (from https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_mobilenet_v2_quantized_300x300_coco.config)
I did put the right path for me that is : fine_tune_checkpoint: "C:/tensorflow1/models/research/object_detection/ssd_mobilenet_v2_quantized_300x300_coco_2019_01_03/model.ckpt" I also have my own labelmap.pbtxt, my train.record and test.record In the model folder I have the following files : model.ckpt.data-00000-of-00001 model.ckpt.index model.ckpt.meta pipeline.config tflite_graph.pb tflite_graph.pbtxt
So i also modified the required path in the "pipeline.config" file. I'm investigating since yesterday, so of course I googled it, but I did not find anything useful online to solve my error unfortunately.
And I also changed the line 83 : type: 'ssd_mobilenet_v2' to type: 'ssd_mobilenet_v2_keras' because I also got an error from this on the default config file.
When I launch the following command :
But indeed, this is telling me the same error as the top of this topic :
My entire config file here :
Originally posted by @Drisnor in https://github.com/tensorflow/models/issues/9278#issuecomment-871459469