tensorflow / models

Models and examples built with TensorFlow

Other

77.19k stars 45.75k forks source link

Training fails on - faster_rcnn_resnet101 - Error : Groundtruth tensor boxes has not been provided - #9808

Open awaisbajwaml opened 3 years ago

awaisbajwaml commented 3 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using

htthttps://github.com/tensorflow/models.git

2. Describe the bug

When I start the training I get the following error.

3. Steps to reproduce

After preparing my data set and with all other formalities. I use the following command to start the training

python model_main_tf2.py --model_dir=/home/ubuntu/tf_2/training/ --pipeline_config_path=/home/ubuntu/tf/faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8.config --alsologtostderr

Steps to reproduce the behavior.

4. Expected behavior

RuntimeError: Groundtruth tensor boxes has not been provided

5. Additional context

Include any logs that would be helpful to diagnose the problem.

6. System information

OS Ubuntu 18.04
TensorFlow installed from source
TensorFlow version TF 2.4.1
Python version: 3.6
CUDA/cuDNN version: 11.2 / 8.1
GPU model and memory: k80 12 gb

khengkok commented 3 years ago

I also encountered the same error with the latest pull. Seems that this bug was just introduced recently. I have an older TFOD repo that didn't have this issues. May want to try to pull an older repo

awaisbajwaml commented 3 years ago

@khengkok which version worked fo ryou

khengkok commented 3 years ago

The repo I had working is with commit ID 9da3a08109237162752bea0fd951af849eb13cdc

awaisbajwaml commented 3 years ago

thanks @khengkok

@tombstone @jch1 and @pkulzc any solution to this?

adamzhg commented 3 years ago

I also encountered the same error with the latest pull today. It causede by the following code in model_lib_v2.py(Line 525 ): dummy_image, dummy_shapes = detection_model.preprocess( tf.zeros([1, 512, 512, 3], dtype=tf.float32)) dummy_prediction_dict = detection_model.predict(dummy_image, dummy_shapes) It looks like predict a dummy pic without providing groundtruth? @tombstone @jch1 @khengkok

awaisbajwaml commented 3 years ago

Any luck Tensorflow team? @tombstone @jch1 and @pkulzc

adamzhg commented 3 years ago

I comment the following code in model_lib_v2.py out and go on my model trainning.
''' dummy_image, dummy_shapes = detection_model.preprocess( tf.zeros([1, 512, 512, 3], dtype=tf.float32)) dummy_prediction_dict = detection_model.predict(dummy_image, dummy_shapes) ''' until now it works, but I do not know whether it will cause potential risks. @awaisbajwaml

awaisbajwaml commented 3 years ago

@adamzhg thanks for the suggestion.

I will build doing the same and try training on the same backbone.

@tombstone @jch1 @pkulzc

awaisbajwaml commented 3 years ago

Any updated guys?

@tombstone @jch1 @pkulzc

awaisbajwaml commented 3 years ago

I had a fix and similar to this , closing this issue.

https://medium.com/mlearning-ai/tensorflow-2-4-with-cuda-11-2-gpu-training-fix-87f205215419

csiki commented 2 years ago

@awaisbajwaml your medium post has nothing to do with the error. could you describe how it got fixed?