infer() failed to predict any bbox after the first call

wangyi1177 commented 3 years ago

The 'serving_default' signature infer always returns pred_bbox with shape (1, 0, 84) except in the first call which returns pred_bbox with shape (1, 8, 84). It happens in detectvideo.py and evaluate.py. Tensorflow version is 2.3.0 as required with cuda10.1.

cis-apoorv commented 3 years ago

Facing the exact same issue. Did you find any solution for this? @wangyi1177

wangyi1177 commented 3 years ago

The infer works correctly when running tensorflow on CPU. Pretty sure something wrong when creating the input tensor on GPU : batch = tf.constant(images_data), but not sure why.

cis-apoorv commented 3 years ago

I had observed the same thing @wangyi1177.

wangyi1177 commented 3 years ago

Add os.environ["CUDA_VISIBLE_DEVICES"] = '0' or any other gpu id before creating InteractiveSession() can solve the problem. Still, not sure why. It's weird that even set os.environ["CUDA_VISIBLE_DEVICES"] = '', the session still use GPU, not CPU as expected.

cis-apoorv commented 3 years ago

Hello @wangyi1177 ! I've got it running by manually saving the weights. You can edit save_model.py and just add: model.save_weights(FLAGS.weights) after: utils.load_weights(model, FLAGS.weights, FLAGS.model, FLAGS.tiny) model.save() is unable to save the weights due to some reason.

yieniggu commented 3 years ago

Same here, it works fine when using the coco models. But it doesn't work with custom models trained directly from darknet. Have you figured any way to solve this?

nishantr05 commented 3 years ago

I'm running with GPU enabled on google collab and facing the same problem. Any fixes? @wangyi1177, @cis-apoorv

kevinhey commented 3 years ago

facing the same problem. Any fixes? @wangyi1177

kevinhey commented 3 years ago

which file to Add os.environ["CUDA_VISIBLE_DEVICES"] = '0' ,I have add this in save_model.py and detect.py,but it didn't work @wangyi1177

cis-apoorv commented 3 years ago

Hello @yieniggu , @nishantr05 , and @kevinhey Please replace: utils.load_weights(model, FLAGS.weights, FLAGS.model, FLAGS.tiny) in saved_model.py with:

model.save_weights(FLAGS.weights)
model.load_weights(FLAGS.weights)

yieniggu commented 3 years ago

@cis-apoorv thanks for your reply. My doubt is that if you entirely remove utils.load_weights(model, FLAGS.weights, FLAGS.model, FLAGS.tiny)

Wouldn't it prohibit us from working with a tiny version?

Also could you please share your version of saved_model.py?

cis-apoorv commented 3 years ago

Hi @yieniggu

You are right it will prohibit you to use the tiny version. So instead what you can do is, first save and load the model using:

model.save_weights(FLAGS.weights)
model.load_weights(FLAGS.weights)

and append: utils.load_weights(model, FLAGS.weights, FLAGS.model, FLAGS.tiny) below model.load_weight().

This worked for my tiny-yolov4 model.

sulebaynes commented 3 years ago

Hi all, neither of the solutions provided here worked for me. The detection only occurs on the first frame of a video when it runs on GPU. No problem when on CPU. Anyone solved this issue? Thanks in advance,

yildbs commented 3 years ago

I have same issue. "infer in detectvideo.py" doesn't work after first call. I hope anyone solve this problem please

SKH93 commented 3 years ago

@cis-apoorv I don't understand you say to explain about that. Is this mean about save_model.py in utils.load_weights code is change to save model.save_weights or model.load_weights? After then model.save_weights or model.load_weights below to append utils.load_weights? I want to use only tensorflow yolov4 model not tiny version.

cis-apoorv commented 3 years ago

Hello @SKH93, What I mean to say is, in save_model.py file append:

model.save_weights(FLAGS.weights)
model.load_weights(FLAGS.weights)

above: utils.load_weights(model, FLAGS.weights, FLAGS.model, FLAGS.tiny) This works for both yolov4 and tiny-yolov4.

toddwong commented 3 years ago

I think I figured it out. tf.keras.Model.save seems not compatible with tf.saved_model.load Use tf.keras.models.load_model with tf.keras.Model.save, and tf.saved_model.save with tf.saved_model.load

phykurox commented 3 years ago

@toddwong It takes really long for me and I have warning messages like

WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
W0630 14:49:52.461997 24320 load.py:171] No training configuration found in save file, so the model was *not* compiled. Compile it manually.

Pedrohgv commented 3 years ago

The above fix slowed prediction on GPU by a lot.

I think I found a way to solve it. I was using a tensorflow 2.4.2 docker image, changing to the tensorflow version specified in the requirements-gpu.txt file (2.3.0rc0-gpu) solved it, ran on GPU, got predictions for the entire video, and fast as usual. I tested predicting on a model converted using the 2.4.2 version and it didnt't work even with predicting with the 2.3.0rc0 version. I had to re-convert the model using TF 2.3.0rc0, and also predict with 2.3.0rc0.

crisptof commented 3 years ago

The above fix slowed prediction on GPU by a lot.

I think I found a way to solve it. I was using a tensorflow 2.4.2 docker image, changing to the tensorflow version specified in the requirements-gpu.txt file (2.3.0rc0-gpu) solved it, ran on GPU, got predictions for the entire video, and fast as usual. I tested predicting on a model converted using the 2.4.2 version and it didnt't work even with predicting with the 2.3.0rc0 version. I had to re-convert the model using TF 2.3.0rc0, and also predict with 2.3.0rc0.

Your solution works for me, thanks!!!

lehieubkhn commented 2 years ago

The above fix slowed prediction on GPU by a lot.

I think I found a way to solve it. I was using a tensorflow 2.4.2 docker image, changing to the tensorflow version specified in the requirements-gpu.txt file (2.3.0rc0-gpu) solved it, ran on GPU, got predictions for the entire video, and fast as usual. I tested predicting on a model converted using the 2.4.2 version and it didnt't work even with predicting with the 2.3.0rc0 version. I had to re-convert the model using TF 2.3.0rc0, and also predict with 2.3.0rc0.

Thanks. It's work for me :D

rishabhshetty98 commented 2 years ago

I faced the same problem with my RTX A5000. After weeks of debugging, what we found was, models trained on Ampere architecture cannot be run on tensorflow <= 2.4.0 after conversion and by extension on this library as it would give us the above issue. We found a workaround by training the model on colab which uses GPUs of architectures earlier than Ampere which give backward compatibility on CUDA. Then we converted the model using save_model.py using tensorflow 2.3.0 and then that model and the repo ended up working as intended on tensorflow 2.5.0 as well.

aproxtimedev commented 2 years ago

I faced the same problem with my RTX A5000. After weeks of debugging, what we found was, models trained on Ampere architecture cannot be run on tensorflow <= 2.4.0 after conversion and by extension on this library as it would give us the above issue. We found a workaround by training the model on colab which uses GPUs of architectures earlier than Ampere which give backward compatibility on CUDA. Then we converted the model using save_model.py using tensorflow 2.3.0 and then that model and the repo ended up working as intended on tensorflow 2.5.0 as well.

Thanks, it's work. Tested converting model using version tensorflow 2.3.0, then use that model to tensorflow 2.5.0

XavierChou707 commented 1 year ago

tensorflow 2.6.0 detect.py :

pred_bbox = infer(batch_data) #
pred_bbox = infer(batch_data) # this code is not work --> []

hunglc007 / tensorflow-yolov4-tflite

infer() failed to predict any bbox after the first call #282