tensorflow / models

Models and examples built with TensorFlow
Other
77.19k stars 45.75k forks source link

Training fails on - faster_rcnn_resnet101 - Error : Groundtruth tensor boxes has not been provided - #9808

Open awaisbajwaml opened 3 years ago

awaisbajwaml commented 3 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using

htthttps://github.com/tensorflow/models.git

2. Describe the bug

When I start the training I get the following error.

image

3. Steps to reproduce

After preparing my data set and with all other formalities. I use the following command to start the training

python model_main_tf2.py --model_dir=/home/ubuntu/tf_2/training/ --pipeline_config_path=/home/ubuntu/tf/faster_rcnn_resnet101_v1_1024x1024_coco17_tpu-8.config --alsologtostderr

Steps to reproduce the behavior.

4. Expected behavior

RuntimeError: Groundtruth tensor boxes has not been provided

5. Additional context

Include any logs that would be helpful to diagnose the problem.

6. System information

khengkok commented 3 years ago

I also encountered the same error with the latest pull. Seems that this bug was just introduced recently. I have an older TFOD repo that didn't have this issues. May want to try to pull an older repo

awaisbajwaml commented 3 years ago

@khengkok which version worked fo ryou

khengkok commented 3 years ago

The repo I had working is with commit ID 9da3a08109237162752bea0fd951af849eb13cdc

awaisbajwaml commented 3 years ago

thanks @khengkok

@tombstone @jch1 and @pkulzc any solution to this?

adamzhg commented 3 years ago

I also encountered the same error with the latest pull today. It causede by the following code in model_lib_v2.py(Line 525 ): dummy_image, dummy_shapes = detection_model.preprocess( tf.zeros([1, 512, 512, 3], dtype=tf.float32)) dummy_prediction_dict = detection_model.predict(dummy_image, dummy_shapes) It looks like predict a dummy pic without providing groundtruth? @tombstone @jch1 @khengkok

awaisbajwaml commented 3 years ago

Any luck Tensorflow team? @tombstone @jch1 and @pkulzc

adamzhg commented 3 years ago

I comment the following code in model_lib_v2.py out and go on my model trainning.
''' dummy_image, dummy_shapes = detection_model.preprocess( tf.zeros([1, 512, 512, 3], dtype=tf.float32)) dummy_prediction_dict = detection_model.predict(dummy_image, dummy_shapes) ''' until now it works, but I do not know whether it will cause potential risks. @awaisbajwaml

awaisbajwaml commented 3 years ago

@adamzhg thanks for the suggestion.

I will build doing the same and try training on the same backbone.

@tombstone @jch1 @pkulzc

awaisbajwaml commented 3 years ago

Any updated guys?

@tombstone @jch1 @pkulzc

awaisbajwaml commented 3 years ago

I had a fix and similar to this , closing this issue.

https://medium.com/mlearning-ai/tensorflow-2-4-with-cuda-11-2-gpu-training-fix-87f205215419

csiki commented 2 years ago

@awaisbajwaml your medium post has nothing to do with the error. could you describe how it got fixed?