tensorflow / models

Models and examples built with TensorFlow
Other
77.18k stars 45.76k forks source link

training exit after the first evaluation(only one evaluation) in Tensorflow model(object_detection) without error and waring #7448

Closed CasvalRay closed 3 years ago

CasvalRay commented 5 years ago

Please go to Stack Overflow for help and support:

http://stackoverflow.com/questions/tagged/tensorflow

Also, please understand that many of the models included in this repository are experimental and research-style code. If you open a GitHub issue, here is our policy:

  1. It must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).
  2. The form below must be filled out.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.


System information

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the problem

I was training the model, but ten minutes later, after an evaluation, the program exit, with no errors or warning.

Source code / logs

my command for training : python object_detection/model_main.py --pipeline_config_path=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/pipeline.config --model_dir=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/saved_model --num_train_steps=50000 --alsologtostderr This is my config : `train_config { batch_size: 24 data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { ssd_random_crop { } } optimizer { rms_prop_optimizer { learning_rate { exponential_decay_learning_rate { initial_learning_rate: 0.00400000018999 decay_steps: 800720 decay_factor: 0.949999988079 } } momentum_optimizer_value: 0.899999976158 decay: 0.899999976158 epsilon: 1.0 } } fine_tune_checkpoint: "D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/model.ckpt" from_detection_checkpoint: true num_steps: 200000

train_input_reader { label_map_path: "D:/gitcode/models/research/object_detection/idol/tf_label_map.pbtxt" tf_record_input_reader { input_path: "D:/gitcode/models/research/objectdetection/idol/train/Iframe??????.tfrecord" } } eval_config { num_examples: 8000 max_evals: 10 use_moving_averages: false } eval_input_reader { label_map_path: "D:/gitcode/models/research/object_detection/idol/tf_label_map.pbtxt" shuffle: false num_readers: 1 tf_record_input_reader { input_path: "D:/gitcode/models/research/objectdetection/idol/eval/Iframe??????.tfrecord" } ` This is my out log: ~ (default) D:\gitcode\models\research>python object_detection/model_main.py --pipeline_config_path=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/pipeline.config --model_dir=D:/gitcode/models/research/object_detection/ssd_mobilenet_v1_coco_2018_01_28/saved_model --num_train_steps=50000 --alsologtostderr

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see:

WARNING:tensorflow:Forced number of epochs for all eval validations to be 1. WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_config.num_epochs = 0. Overwriting num_epochs to 1. WARNING:tensorflow:Estimator's model_fn (<function create_model_fn..model_fn at 0x0000027CBAB7BB70>) includes params argument, but params are not passed to Estimator. WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\builders\dataset_builder.py:86: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.experimental.parallel_interleave(...). WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\core\preprocessor.py:196: sample_distorted_bounding_box (from tensorflow.python.ops.image_ops_impl) is deprecated and will be removed in a future version. Instructions for updating: seed2 arg is deprecated.Use sample_distorted_bounding_box_v2 instead. WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\builders\dataset_builder.py:158: batch_and_drop_remainder (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.Dataset.batch(..., drop_remainder=True). WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\tensorflow\python\ops\losses\losses_impl.py:448: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\tensorflow\python\ops\array_grad.py:425: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. 2019-08-14 16:29:31.607841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7845 pciBusID: 0000:04:00.0 totalMemory: 6.00GiB freeMemory: 4.97GiB 2019-08-14 16:29:31.621836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-08-14 16:29:32.275712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-08-14 16:29:32.283072: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-08-14 16:29:32.288675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-08-14 16:29:32.293514: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4714 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:04:00.0, compute capability: 6.1) WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\eval_util.py:796: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\object_detection-0.1-py3.7.egg\object_detection\utils\visualization_utils.py:498: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version. Instructions for updating: tf.py_func is deprecated in TF V2. Instead, use tf.py_function, which takes a python function which manipulates tf eager tensors instead of numpy arrays. It's easy to convert a tf eager tensor to an ndarray (just call tensor.numpy()) but having access to eager tensors means tf.py_functions can use accelerators such as GPUs as well as being differentiable using a gradient tape.

2019-08-14 16:41:44.736212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-08-14 16:41:44.741242: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-08-14 16:41:44.747522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-08-14 16:41:44.751256: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-08-14 16:41:44.755548: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4714 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:04:00.0, compute capability: 6.1) WARNING:tensorflow:From C:\Users\qian\Anaconda3\envs\default\lib\site-packages\tensorflow\python\training\saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. creating index... index created! creating index... index created! Running per image evaluation... Evaluate annotation type bbox DONE (t=2.43s). Accumulating evaluation results... DONE (t=0.14s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.287 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.529 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.278 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.031 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.312 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.162 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.356 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.356 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.061 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.384

(default) D:\gitcode\models\research> ~

google-ml-butler[bot] commented 3 years ago

Are you satisfied with the resolution of your issue? Yes No