tensorflow / models

Models and examples built with TensorFlow
Other
77.21k stars 45.75k forks source link

INFO:tensorflow:Waiting for new checkpoint at models/faster_rcnn_inception_resnet_v2 #9867

Open pngimbwa opened 3 years ago

pngimbwa commented 3 years ago

I used the transfer learning approach to develop a detection model using the faster_rcnn algorithm.

To evaluate my model, I used the following commands-

!python model_main_tf2.py --model_dir=models/faster_rcnn_inception_resnet_v2 --pipeline_config_path=models/faster_rcnn_inception_resnet_v2/pipeline.config --checkpoint_dir=models/faster_rcnn_inception_resnet_v2

However, I have been getting the following error/info message: -

INFO:tensorflow:Waiting for new checkpoint at models/faster_rcnn_inception_resnet_v2 I0331 23:23:11.699681 140426971481984 checkpoint_utils.py:139] Waiting for new checkpoint at models/faster_rcnn_inception_resnet_v2

I checked the path to the checkpoint_dir is correct. What could be the problem and how can I resolve it?

Thanks in advance.

PelinSuK commented 3 years ago

Helloo i have errors while running model_main_tf2.py. " python model_main_tf2.py --alsologtostderr --model_dir=training/ --pipeline_config_path=training/faster_rcnn_inception_v2_pets.config " I dont have any custom steps i did everything from here But my tensorflow version is 2.5 and i did all setups according to 2.5 So the errors are = File "C:\tensorflow1\models\research\object_detection\model_main_tf2.py", line 116, in tf.compat.v1.app.run() File "C:\Users\pelin\anaconda3\envs\tensorflow1\lib\site-packages\tensorflow\python\platform\app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "C:\Users\pelin\anaconda3\envs\tensorflow1\lib\site-packages\absl\app.py", line 303, in run _run_main(main, args) File "C:\Users\pelin\anaconda3\envs\tensorflow1\lib\site-packages\absl\app.py", line 251, in _run_main sys.exit(main(argv)) File "C:\tensorflow1\models\research\object_detection\model_main_tf2.py", line 106, in main model_lib_v2.train_loop( File "C:\tensorflow1\models\research\object_detection\model_lib_v2.py", line 524, in train_loop raise ValueError('train_pb2.load_all_detection_checkpoint_vars ' ValueError: train_pb2.load_all_detection_checkpoint_vars unsupported in TF2

I searched for 4 days to find a solution. Im so new to this topic and couldnt solve the problems. I would be so appreciate if you help me.

Arielcchy commented 2 years ago

running into the same issue, have tried to re-run and shut down the whole computer, still got the same issue:

first time running error: 2022-04-11 18:33:11.683792: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dyna 2022-04-11 18:33:11.684123: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if 2022-04-11 18:33:25.536964: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dyna 2022-04-11 18:33:25.537671: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOW 2022-04-11 18:33:25.546644: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnost 2022-04-11 18:33:25.547306: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-5E346D WARNING:tensorflow:Forced number of epochs for all eval validations to be 1. W0411 18:33:25.565760 6608 model_lib_v2.py:1089] Forced number of epochs for all eval validations to be 1. INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: None I0411 18:33:25.566393 6608 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: None INFO:tensorflow:Maybe overwriting use_bfloat16: False I0411 18:33:25.567394 6608 config_util.py:552] Maybe overwriting use_bfloat16: False INFO:tensorflow:Maybe overwriting eval_num_epochs: 1 I0411 18:33:25.569395 6608 config_util.py:552] Maybe overwriting eval_num_epochs: 1 WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encountered eval_on_train_input_conf W0411 18:33:25.570396 6608 model_lib_v2.py:1107] Expected number of evaluation epochs is 1, but instead encoun 2022-04-11 18:33:25.609607: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optial operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. INFO:tensorflow:Reading unweighted datasets: ['annotations/test.record'] I0411 18:33:25.836392 6608 dataset_builder.py:163] Reading unweighted datasets: ['annotations/test.record'] INFO:tensorflow:Reading record datasets for input file: ['annotations/test.record'] I0411 18:33:25.838394 6608 dataset_builder.py:80] Reading record datasets for input file: ['annotations/test.r INFO:tensorflow:Number of filenames to read: 1 I0411 18:33:25.839392 6608 dataset_builder.py:81] Number of filenames to read: 1 WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards. W0411 18:33:25.840394 6608 dataset_builder.py:87] num_readers has been reduced to 1 to match input file shards WARNING:tensorflow:From C:\Users\Ariel\anaconda3\envs\TFOD_API\lib\site-packages\object_detection\builders\datad and will be removed in a future version. Instructions for updating: Usetf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)ins W0411 18:33:25.886393 6608 deprecation.py:337] From C:\Users\Ariel\anaconda3\envs\TFOD_API\lib\site-packages\o.interleave_ops) is deprecated and will be removed in a future version. Instructions for updating: Usetf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)ins WARNING:tensorflow:From C:\Users\Ariel\anaconda3\envs\TFOD_API\lib\site-packages\object_detection\builders\data and will be removed in a future version. Instructions for updating: Usetf.data.Dataset.map() W0411 18:33:25.994394 6608 deprecation.py:337] From C:\Users\Ariel\anaconda3\envs\TFOD_API\lib\site-packages\ops.dataset_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.Dataset.map() WARNING:tensorflow:From C:\Users\Ariel\anaconda3\envs\TFOD_API\lib\site-packages\tensorflow\python\util\dispatcersion. Instructions for updating: Create atf.sparse.SparseTensorand usetf.sparse.to_denseinstead. W0411 18:33:34.039652 6608 deprecation.py:337] From C:\Users\Ariel\anaconda3\envs\TFOD_API\lib\site-packages\t n\util\dispatch.py:1082: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Usetf.castinstead. W0411 18:33:36.022048 6608 deprecation.py:337] From C:\Users\Ariel\anaconda3\envs\TFOD_API\lib\site-packages\tensorflow\python\util\dispatch.py:1082: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Usetf.castinstead. INFO:tensorflow:Waiting for new checkpoint at models/my_ssd_resnet50_v1_fpn I0411 18:33:40.065962 6608 checkpoint_utils.py:136] Waiting for new checkpoint at models/my_ssd_resnet50_v1_fpn Traceback (most recent call last): File "C:\Users\Ariel\...\Object_Detection\TensorFlow\workspace\training_demo\model_main_tf2.py", line 115, in <module> tf.compat.v1.app.run() File "C:\Users\Ariel\anaconda3\envs\TFOD_API\lib\site-packages\tensorflow\python\platform\app.py", line 36, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "C:\Users\Ariel\anaconda3\envs\TFOD_API\lib\site-packages\absl\app.py", line 312, in run _run_main(main, args) sys.exit(main(argv)) File "C:\Users\Ariel\...\Object_Detection\TensorFlow\workspace\training_demo\model_main_tf2.py", line 82, in main model_lib_v2.eval_continuously( File "C:\Users\Ariel\anaconda3\envs\TFOD_API\lib\site-packages\object_detection\model_lib_v2.py", line 1136, in eval_continuously for latest_checkpoint in tf.train.checkpoints_iterator( File "C:\Users\Ariel\anaconda3\envs\TFOD_API\lib\site-packages\tensorflow\python\training\checkpoint_utils.py", line 194, in checkpoints_iterator new_checkpoint_path = wait_for_new_checkpoint( File "C:\Users\Ariel\anaconda3\envs\TFOD_API\lib\site-packages\tensorflow\python\training\checkpoint_utils.py", line 143, in wait_for_new_checkpoint time.sleep(seconds_to_sleep) KeyboardInterrupt second time running error: PS C:\Users\Ariel\...\Object_Detection\TensorFlow\workspace\training_demo> python model_main_tf2.py --model_dir=models/my_ssd_resnet50_v1_fpn --pipeline_config_path=models/my_ssd_resnet50_v1_fpn/pipeline.config --checkpoint_dir=models/my_ssd_resnet50_v1_fpn 2022-04-11 19:21:02.827103: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2022-04-11 19:21:02.827874: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2022-04-11 19:21:20.113896: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found 2022-04-11 19:21:20.114520: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303) 2022-04-11 19:21:20.126895: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: $desktop number 2022-04-11 19:21:20.127801: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: DESKTOP-5E346DM WARNING:tensorflow:Forced number of epochs for all eval validations to be 1. W0411 19:21:20.142226 17628 model_lib_v2.py:1089] Forced number of epochs for all eval validations to be 1. INFO:tensorflow:Maybe overwriting sample_1_of_n_eval_examples: None I0411 19:21:20.146238 17628 config_util.py:552] Maybe overwriting sample_1_of_n_eval_examples: None INFO:tensorflow:Maybe overwriting use_bfloat16: False I0411 19:21:20.148228 17628 config_util.py:552] Maybe overwriting use_bfloat16: False INFO:tensorflow:Maybe overwriting eval_num_epochs: 1 I0411 19:21:20.150228 17628 config_util.py:552] Maybe overwriting eval_num_epochs: 1 WARNING:tensorflow:Expected number of evaluation epochs is 1, but instead encounteredeval_on_train_input_config.num_epochs= 0. Overwritingnum_epochsto 1. W0411 19:21:20.151353 17628 model_lib_v2.py:1107] Expected number of evaluation epochs is 1, but instead encounteredeval_on_train_input_config.num_epochs= 0. Overwritingnum_epochsto 1. 2022-04-11 19:21:20.166845: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. INFO:tensorflow:Reading unweighted datasets: ['annotations/test.record'] I0411 19:21:20.387223 17628 dataset_builder.py:163] Reading unweighted datasets: ['annotations/test.record'] INFO:tensorflow:Reading record datasets for input file: ['annotations/test.record'] I0411 19:21:20.388226 17628 dataset_builder.py:80] Reading record datasets for input file: ['annotations/test.record'] INFO:tensorflow:Number of filenames to read: 1 I0411 19:21:20.390228 17628 dataset_builder.py:81] Number of filenames to read: 1 WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards. W0411 19:21:20.392227 17628 dataset_builder.py:87] num_readers has been reduced to 1 to match input file shards. WARNING:tensorflow:From C:\Users\Ariel\anaconda3\envs\TFOD_API\lib\site-packages\object_detection\builders\dataset_builder.py:101: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version. Instructions for updating: Usetf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)instead. If sloppy execution is desired, usetf.data.Options.deterministic. W0411 19:21:20.398269 17628 deprecation.py:337] From C:\Users\Ariel\anaconda3\envs\TFOD_API\lib\site-packages\object_detection\builders\dataset_builder.py:101: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version. Instructions for updating: Usetf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)instead. If sloppy execution is desired, usetf.data.Options.deterministic. WARNING:tensorflow:From C:\Users\Ariel\anaconda3\envs\TFOD_API\lib\site-packages\object_detection\builders\dataset_builder.py:236: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version. Instructions for updating: Usetf.data.Dataset.map() W0411 19:21:20.463334 17628 deprecation.py:337] From C:\Users\Ariel\anaconda3\envs\TFOD_API\lib\site-packages\object_detection\builders\dataset_builder.py:236: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.data.Dataset.map() WARNING:tensorflow:From C:\Users\Ariel\anaconda3\envs\TFOD_API\lib\site-packages\tensorflow\python\util\dispatch.py:1082: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create atf.sparse.SparseTensorand usetf.sparse.to_denseinstead. W0411 19:21:29.529179 17628 deprecation.py:337] From C:\Users\Ariel\anaconda3\envs\TFOD_API\lib\site-packages\tensorflow\python\util\dispatch.py:1082: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version. Instructions for updating: Create atf.sparse.SparseTensorand usetf.sparse.to_denseinstead. Instructions for updating: Usetf.castinstead. W0411 19:21:32.273636 17628 deprecation.py:337] From C:\Users\Ariel\anaconda3\envs\TFOD_API\lib\site-packages\tensorflow\python\util\dispatch.py:1082: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Usetf.cast` instead. INFO:tensorflow:Waiting for new checkpoint at models/my_ssd_resnet50_v1_fpn I0411 19:21:38.885718 17628 checkpoint_utils.py:136] Waiting for new checkpoint at models/my_ssd_resnet50_v1_fpn INFO:tensorflow:Timed-out waiting for a checkpoint. I0411 20:45:33.894065 17628 checkpoint_utils.py:199] Timed-out waiting for a checkpoint.

Arielcchy commented 2 years ago

Not sure is this the same issue as mine but I solved my issue by https://stackoverflow.com/questions/64510791/tf2-object-detection-api-model-main-tf2-py-validation-loss I ran training & evaluation(with --checkpoint_dir=XXX) "PARALLEL" in the different terminals. Thanks.