tensorflow / models

Models and examples built with TensorFlow
Other
77.16k stars 45.76k forks source link

Save Check point while training checkpoint_every_n #9650

Open DevLob-zz opened 3 years ago

DevLob-zz commented 3 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/research/object_detection/model_lib_v2.py

2. Describe the bug

A clear and concise description of what the bug is. there a parameter to save checkpoint through learning checkpoint_every_n suppose this value was checkpoint_every_n = 1000 and i make the steps number to be 800 or 1200 actually if i set 800 there in no checkpoint will saved and the exporter export empty model will not able to predict anything the other case last checkpoint will be only for 1000 and the other 200 will not saved and if i planned to resume a model with higher steps again it will start from 1000 not 1200

3. Steps to reproduce

mentioned above Steps to reproduce the behavior.

4. Expected behavior

A clear and concise description of what you expected to happen.

5. Additional context

Include any logs that would be helpful to diagnose the problem. if ((int(global_step.value()) - checkpointed_step) >= checkpoint_every_n or global_step.value() == train_steps): if(global_step.value() == train_steps) : warning.warn("Last Step") else: warning.warn("Regular Check Point") manager.save() checkpointed_step = int(global_step.value()) i just recommended if we can save in case i reach final step or checkpoint_every_n

6. System information

ekesdf commented 3 years ago

Do you use the model_main.py ??

If so there is a default value in the flags right at the top of the file If you change this value it will work as you expect

DevLob-zz commented 3 years ago

i used model_main_v2.py should i use model_main.py instead of model_main_v2.py

ekesdf commented 3 years ago

it does not matter what version do you use if you use the model_main.py you can use only model zoo tf1 models and with mode_main_v2.py you can only use model zoo tf2
so you have to decide between those two so it is just a compatible thing like a fit with b but a does not fit with h

DevLob-zz commented 3 years ago

i used model_main_v2 with model zoo tf2 and for now no way to run Evaluation with training in the same time