tensorflow / models

Models and examples built with TensorFlow
Other
76.95k stars 45.79k forks source link

Error in loading OD checkpoints #8996

Open DrewClacksila opened 4 years ago

DrewClacksila commented 4 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md https://github.com/tensorflow/models/blob/master/research/object_detection/model_main_tf2.py

2. Describe the bug

model_main_tf2.py fails in assert_existing_objects_matched when importing fine_tune_checkpoint from models provided in the TF2 OD model zoo. If the assertion is suppressed the model is able to train but the checkpoint is not correctly loaded.

The error persists in all TF2 models tested:

3. Steps to reproduce

Run model_main_tf2.py with fresh model zoo model.

4. Expected behavior

Loading checkpoint.

5. Additional context

fine_tune_checkpoint: "/code/ssd_mobilenet_v2_320x320_coco17_tpu-8/checkpoint/ckpt-0"

Traceback (most recent call last):
  File "/home/ubuntu/models/research/object_detection/model_main_tf2.py", line 113, in <module>
    tf.compat.v1.app.run()
  File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/ubuntu/models/research/object_detection/model_main_tf2.py", line 110, in main
    record_summaries=FLAGS.record_summaries)
  File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/object_detection/model_lib_v2.py", line 561, in train_loop
    unpad_groundtruth_tensors)
  File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/object_detection/model_lib_v2.py", line 375, in load_fine_tune_checkpoint
    ckpt.restore(checkpoint_path).assert_existing_objects_matched()
  File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow/python/training/tracking/util.py", line 783, in assert_existing_objects_matched
    (list(unused_python_objects),))
AssertionError: Some Python objects were not bound to checkpointed values, likely due to changes in the Python program:

Possibly irrelevant, but model_lib_v2.py still uses CheckpointV1 instead of the TF2 version defined in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/tracking/util.py.

6. System information

DrewClacksila commented 4 years ago

Possibly the same problem mentioned here: https://github.com/tensorflow/tensorflow/issues/33150#issuecomment-665355501

DrewClacksila commented 4 years ago

@ravikyram What input are you waiting for?

PetreanuAndi commented 4 years ago

same error. Bump.

There are mentions about changing the fine_tune_checkpoint_type to "detection" instead of "classification" and it can solve the problem for the models that support "detection" checkpoint type. Search for get_sub_model() in the feature extractor.

However, CenterNet does not support "detection" for the pretrained models offered in the ZOO. Funny that centerNet DOES have support for "detection" with the mobilenets backbone, but that backbone does not have any pre-trained weights in the ZOO.

This mutes the warnings too : (line 375 model_lib_v2.py) ckpt.restore(checkpoint_path).expect_partial()

Google / OD Api people, what is going on? V1 and V2 are also amazing, some models are here but config needs checkpoint_Version V2 no matter what.

chad-green commented 4 years ago

Same issue. Can't restore tf2 models from model zoo. Any updates?

chad-green commented 3 years ago

For future reference, looks like this was resolved: https://github.com/tensorflow/tensorflow/issues/34544