Deadfish-hk commented 3 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

[x ] I am using the latest TensorFlow Model Garden release and TensorFlow 2.
[x] I am reporting the issue to the correct repository. (Model Garden official or research directory)
[x ] I checked to make sure that this issue has not already been filed.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/object_detection/model_main_tf2.py

2. Describe the bug

The ODAPI failed to initiate by running into runtime error.

Traceback (most recent call last): File "/content/models/research/object_detection/model_main_tf2.py", line 113, in tf.compat.v1.app.run() File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run _run_main(main, args) File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "/content/models/research/object_detection/model_main_tf2.py", line 110, in main record_summaries=FLAGS.record_summaries) File "/usr/local/lib/python3.6/dist-packages/object_detection/model_lib_v2.py", line 523, in train_loop dummy_prediction_dict = detection_model.predict(dummy_image, dummy_shapes) File "/usr/local/lib/python3.6/dist-packages/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 830, in predict **side_inputs)) File "/usr/local/lib/python3.6/dist-packages/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 999, in _predict_second_stage image_shape, true_image_shapes) File "/usr/local/lib/python3.6/dist-packages/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 729, in _proposal_postprocess anchors, image_shape_2d, true_image_shapes) File "/usr/local/lib/python3.6/dist-packages/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 1754, in _postprocess_rpn ) = self._format_groundtruth_data(image_shapes) File "/usr/local/lib/python3.6/dist-packages/object_detection/meta_architectures/faster_rcnn_meta_arch.py", line 1891, in _format_groundtruth_data self.groundtruth_lists(fields.BoxListFields.boxes)) File "/usr/local/lib/python3.6/dist-packages/object_detection/core/model.py", line 118, in groundtruth_lists field)) RuntimeError: Groundtruth tensor boxes has not been provided

3. Steps to reproduce

Clone the latest model, run model_main_tf2.py by Faster R-CNN ResNet101 V1 800x1333 config.

4. Expected behaviour

The training initiate as smooth as if before.

5. Additional context

Revert to the last model_lib_v2.py before EMA changes solved the problem.

6. System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Google Colab
Mobile device name if the issue happens on a mobile device:
TensorFlow installed from (source or binary): source
TensorFlow version (use command below): 2.4
Python version: 3.6
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: cuda 10.1 cudnn 7.0
GPU model and memory: P100 16gb

dangiiii commented 3 years ago

Did you fix the bug? Exactly the same behavior here, really fun working with tensorflow. Bugs at every possible step, 1000s of possible solutions and configurations you have to test, hours/days of time wasted for potentially simple shit that other frameworks provide easily. I'm really impressed

Just another fun note on this error: When trying on my local machine with only CPU-support enabled it works without the error. When switching to the server and trying to use the GPU's it spit's out that groundtruth error (makes no sense whatever). Exactly the same datasets, exactly the same config file, exactly the same model. And I fixed all the setup problems beforehand, which only involved googling for days. So yeah, I wouldn't use this shit if I didn't have to cause of work.

"Revert to the last model_main_tf2.py before EMA changes solved the problem." - Which version did you take? I don't understand how this could be related since the latest update to that file is from June/July 2020, and this error seems to be new

Deadfish-hk commented 3 years ago

I fixed the bug by replacing the new model_lib_v2.py (provided in commit 852e098) with the old model_lib_v2.py (retrieved by commit 2a370f).

dangiiii commented 3 years ago

Thanks mate, that solved the problem for me (well, it lead to other problems (surprise, surprise) but those could be fixed as well). If anybody runs into problems concerning tensor shapes or numpy after replacing the model_lib file, take a look at this post https://github.com/tensorflow/models/issues/9738#issuecomment-782139216 .

Deadfish-hk commented 3 years ago

Sorry for mentioning the wrong file in the first place. Glad to know that your problem solved as well.

It is strange that even the moving average option in the config file is set to false, the model is still trying to build a shadow copy for preparing the EMA optimiser. Maybe an if-bypass statement could solve the problem once and for all.

tanishkthomas commented 3 years ago

I fixed the bug by replacing the new model_lib_v2.py (provided in commit 852e098) with the old model_lib_v2.py (retrieved by commit 2a370f).

Can you please provide the retrieved model_lib_v2.py

Deadfish-hk commented 3 years ago

This is the model_lib_v2.py in commit 2a370fe.

https://github.com/tensorflow/models/blob/2a370fe09ff18800e2530cdd5a7bc25b5dcf7114/research/object_detection/model_lib_v2.py

rfmac-inspectos commented 3 years ago

I replaced the mentioned file, but the error remained any way.

So I decided simply to checkout to a commit I knew it was working.
Then (after cloning the repo) cd into the repo dir, run the command bellow and be happy :)

git checkout 47f8d4dfb83d2cba06134e0797d10087eb0697d0

midnightspecia1 commented 3 years ago

Run into this problem today,

This is the model_lib_v2.py in commit 2a370fe.

https://github.com/tensorflow/models/blob/2a370fe09ff18800e2530cdd5a7bc25b5dcf7114/research/object_detection/model_lib_v2.py

Both solutions changin the model_lib_v2.py file and checkout to 47f8d4dfb83d2cba06134e0797d10087eb0697d0 lead to error

Function call stack:
_dummy_computation_fn

tensorflow / models

ODAPI failed to initiate by running into runtime error. #9735

Prerequisites

1. The entire URL of the file you are using

2. Describe the bug

3. Steps to reproduce

4. Expected behaviour

5. Additional context

6. System information