Open thekat70 opened 3 years ago
I do have the same issue. Did you find a way to load them?
What's more, in my case, when setting is_training
to false rebuilding the graph then is asking to provide Groundtruth tensor and so, not being able to predict (in another script).
Thanks!
Prerequisites
Please answer the following questions for yourself before submitting an issue.
1. The entire URL of the file you are using
https://github.com/tensorflow/models/blob/master/research/object_detection/configs/tf2/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config or models/research/object_detection/configs/tf2/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config Model from http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz
:
2. Describe the bug
A clear and concise description of what the bug is.
Poor inference performance when restoring model variables from checkpoints for inference.
After successfully finetuning the duck detection model the in memory model has excellent accuracy correctly predicting the rubber duck location, as expected.
However, when rebuilding the graph with is_training=False and then loading the latest checkpoint from training causes the model to have have very poor inference accuracy. The model predicts multiple boxes for each image with high confidence.
Building the graph with is_training=True and loading the latest checkpoint file from training causes the model to have good inference performance.
3. Steps to reproduce
Steps to reproduce the behavior.
I have recreated a minimal example Google Colab notebook (slightly adapted from the fine tuning example) that recreates the incorrect behaviour.
https://colab.research.google.com/drive/1yzRXF7-ymHDhJrXHNutaY8QW3C1imXTZ?usp=sharing
I have also attached a Jupyter Notebook and added the code code extracted from the notebook at the bottom of the file bugreport.zip
4. Expected behavior
A clear and concise description of what you expected to happen.
Nearly identical inference performance when reloading from saved checkpoints for inference.
5. Additional context
Include any logs that would be helpful to diagnose the problem.
This behaviour doesn't always occur, very infrequently when loading checkpoints the model seems to have identical inference performance. This is very rare, and there's no obvious cause.
The same behaviour occurs when using the SavedModel format: good inference with the in-memory model after training, and very poor inference after saving and loading a model using SavedModel format.
I have used this Tensorflow installation to also train a model not using the TF Obj Det API and saving and restoring checkpoints works as expected.
This behaviour seems to happen with TF 2.3.1 as well.
We have compared all the variables in the loaded model and all variables seem to have loaded correctly. No checkpoint values are unused and values seem correct.
It seems potentially related to BatchNormalisation layers?
6. System information
Two platforms: Windows and Google Colab
Windows 10 Enterprise:
Version 20H2 (OS Build 19042.928) - 64Gb RAM
NVidia RTX2080Ti - 11Gb VRAM - latest NVidia drivers
Python version 3.7.8 - 64 bit
Tensorflow version: git_version=v2.4.0-49-g85c8b2a817f, version=2.4.1installed from PyPi (installed following instruction from here https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2.md)
CUDA version: 11.0
cuDNN version 8.0.4
Google Colab:
7. Code