tensorflow / models

Models and examples built with TensorFlow
Other
77.05k stars 45.77k forks source link

Object detection API stuck on None of the MLIR optimization passes are enabled (registered 2) #9633

Closed ysefwakil closed 3 years ago

ysefwakil commented 3 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/tree/master/official/...

2. Describe the bug

Not allowing me to start training

3. Steps to reproduce

Followed the tutorial posted https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/

4. Expected behavior

I expected to start training but it is stuck on that line and not continuing further

5. Additional context

None

6. System information

I am following the steps and guide for the Tensorflow object detection API tutorial (https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/) and everything worked up till now where It wont commence training. Any Ideas?

image

saikumarchalla commented 3 years ago

@ysefwakil Could you please fill the issue template.Also Please provide the simple standalone code/ colab link to reproduce the issue at our end.Thanks

ationder commented 3 years ago

i keep getting the same one :/

ysefwakil commented 3 years ago

There is no code its just following the Tensorflow object detection tutorial API https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/

gt5-gerry commented 3 years ago

Hi I just had the same problem and I was at it for hours. I realised there were no test pictures in my images folder. I had to restart and it’s working now

saikumarchalla commented 3 years ago

@ysefwakil Could you please create a fresh environment and follow the steps as mentioned in the document again. If the issue still persists please let us know. Thanks!

ysefwakil commented 3 years ago

I decided to not use roboflow for conversion of xml and just follow the exact same steps and my model worked.

saikumarchalla commented 3 years ago

@ysefwakil Please go ahead and close the issue if it is resolved for you. Thanks!

google-ml-butler[bot] commented 3 years ago

Are you satisfied with the resolution of your issue? Yes No

manasvini21 commented 3 years ago

I am facing the same issue. Please help me

HadiSDev commented 3 years ago

I am facing the same issue here as well. Please reopen

gcunhase commented 3 years ago

Same issue, is there a solution?

SAFI36 commented 3 years ago

This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDN N) to use the following CPU instructions in performance-critical operations: AVX To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. Aidez- moi s'il vous plait

fengwang commented 3 years ago

I had the same problem, not in this object detection api, but in a custom training script. Going back to TF 2.5 solved my problem.

It seems TF2.6 get stuck when loading libcublas.

TF 2.5 gives such output before training starts:

2021-10-18 17:58:00.954630: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-10-18 17:58:00.977132: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2200665000 Hz
2021-10-18 17:58:01.298032: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-10-18 17:58:01.722974: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8204
2021-10-18 17:58:07.520925: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-10-18 17:58:09.103060: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11

and TF2.6 get stuck after the 4th line 2021-10-18 17:58:01.722974: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8204

tbikash62 commented 3 years ago

Guys if its getting stuck.

  1. Check if your image sizes are uniform. (height*width)--> both test and train
  2. Check id tf records are generated properly and are not empty.
  3. tf records should not be open in notepad++ (was one of reasons happened to me :D ) Hope it help!
PiggsBoson commented 2 years ago

I encountered this trying to run other people's code. I solved this by changing from tf 2 to tf1.

jacktang commented 2 years ago

I am facing the same problem.