tensorflow / models

Models and examples built with TensorFlow
Other
76.95k stars 45.8k forks source link

RuntimeError: There was no new checkpoint after the training. Eval status: missing checkpoint #8414

Open yongzhe2160 opened 4 years ago

yongzhe2160 commented 4 years ago

System information

Please provide the entire URL of the model you are using?

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_pets.md

Describe the current behavior Failed after ~20min. Retired a few times, same error.

Describe the expected behavior Training succeeds.

Code to reproduce the issue

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_pets.md

Other info / logs

{ insertId: "gng8v7fh35gc1"
jsonPayload: { created: 1587446734.441076
levelname: "ERROR"
lineno: 328
message: "RuntimeError: There was no new checkpoint after the training. Eval status: missing checkpoint"
pathname: "/runcloudml.py"
} labels: { compute.googleapis.com/resource_id: "3602514279639793991"
compute.googleapis.com/resource_name: "gke-cml-0421-050704--n1-standard-8-30-48740b2f-bhw1"
compute.googleapis.com/zone: "us-central1-c"
ml.googleapis.com/job_id/log_area: "root"
ml.googleapis.com/trial_id: ""
} logName: "projects/yongzhe-test/logs/master-replica-0"
receiveTimestamp: "2020-04-21T05:25:37.672555504Z"
resource: { labels: { job_id: "yongzhe_object_detection_pets_04_20_2020_22_07_01"
project_id: "yongzhe-test"
task_name: "master-replica-0"
} type: "ml_job"
} severity: "ERROR"
timestamp: "2020-04-21T05:25:34.441076039Z"
}

MihailMihaylov97 commented 4 years ago

any solutions to this issue?