tensorflow / models

Models and examples built with TensorFlow
Other
77.01k stars 45.78k forks source link

Unable to create/update the checkpoints during training #9168

Open BandaruMeghana opened 4 years ago

BandaruMeghana commented 4 years ago

Hi team,

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/object_detection/configs/tf2/faster_rcnn_resnet101_v1_640x640_coco17_tpu-8.config

2. Describe the bug

I'm using the faster RCNN 640x640 model for object detection on a custom dataset. During the training, I see that the events file, events.out.tfevents.1598792593.Meghana.23775.1504.v2 inside thetrain directory is getting updated. But, the corresponding checkpoints are not created.

Refer to the screenshots for the timestamps

Screenshot from 2020-08-30 20-14-40 image

The checkpoints in the above screenshots refer to the restored checkpoints. Not the once created during the training process. Also, I see no log messages stating that the checkpoints are being saved as in TF1 image

Versions: OS: ubuntu TF: 2.3.0 python: 3.8.3

Thank you, Meghana

AtosheIslamSumaya commented 1 year ago

i am facing the same problem