Open wangyi1177 opened 3 years ago
Facing the exact same issue. Did you find any solution for this? @wangyi1177
The infer works correctly when running tensorflow on CPU. Pretty sure something wrong when creating the input tensor on GPU : batch = tf.constant(images_data), but not sure why.
I had observed the same thing @wangyi1177.
Add os.environ["CUDA_VISIBLE_DEVICES"] = '0' or any other gpu id before creating InteractiveSession() can solve the problem. Still, not sure why. It's weird that even set os.environ["CUDA_VISIBLE_DEVICES"] = '', the session still use GPU, not CPU as expected.
Hello @wangyi1177 !
I've got it running by manually saving the weights. You can edit save_model.py and just add:
model.save_weights(FLAGS.weights)
after:
utils.load_weights(model, FLAGS.weights, FLAGS.model, FLAGS.tiny)
model.save() is unable to save the weights due to some reason.
Same here, it works fine when using the coco models. But it doesn't work with custom models trained directly from darknet. Have you figured any way to solve this?
I'm running with GPU enabled on google collab and facing the same problem. Any fixes? @wangyi1177, @cis-apoorv
facing the same problem. Any fixes? @wangyi1177
which file to Add os.environ["CUDA_VISIBLE_DEVICES"] = '0' ,I have add this in save_model.py and detect.py,but it didn't work @wangyi1177
Hello @yieniggu , @nishantr05 , and @kevinhey Please replace:
utils.load_weights(model, FLAGS.weights, FLAGS.model, FLAGS.tiny)
in saved_model.py with:
model.save_weights(FLAGS.weights)
model.load_weights(FLAGS.weights)
@cis-apoorv thanks for your reply. My doubt is that if you entirely remove
utils.load_weights(model, FLAGS.weights, FLAGS.model, FLAGS.tiny)
Wouldn't it prohibit us from working with a tiny version?
Also could you please share your version of saved_model.py?
Hi @yieniggu
You are right it will prohibit you to use the tiny version. So instead what you can do is, first save and load the model using:
model.save_weights(FLAGS.weights)
model.load_weights(FLAGS.weights)
and append:
utils.load_weights(model, FLAGS.weights, FLAGS.model, FLAGS.tiny)
below model.load_weight().
This worked for my tiny-yolov4 model.
Hi all, neither of the solutions provided here worked for me. The detection only occurs on the first frame of a video when it runs on GPU. No problem when on CPU. Anyone solved this issue? Thanks in advance,
I have same issue. "infer in detectvideo.py" doesn't work after first call. I hope anyone solve this problem please
@cis-apoorv I don't understand you say to explain about that. Is this mean about save_model.py in utils.load_weights code is change to save model.save_weights or model.load_weights? After then model.save_weights or model.load_weights below to append utils.load_weights? I want to use only tensorflow yolov4 model not tiny version.
Hello @SKH93, What I mean to say is, in save_model.py file append:
model.save_weights(FLAGS.weights)
model.load_weights(FLAGS.weights)
above:
utils.load_weights(model, FLAGS.weights, FLAGS.model, FLAGS.tiny)
This works for both yolov4 and tiny-yolov4.
I think I figured it out. tf.keras.Model.save
seems not compatible with tf.saved_model.load
Use tf.keras.models.load_model
with tf.keras.Model.save
,
and tf.saved_model.save
with tf.saved_model.load
@toddwong It takes really long for me and I have warning messages like
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
W0630 14:49:52.461997 24320 load.py:171] No training configuration found in save file, so the model was *not* compiled. Compile it manually.
The above fix slowed prediction on GPU by a lot.
I think I found a way to solve it. I was using a tensorflow 2.4.2 docker image, changing to the tensorflow version specified in the requirements-gpu.txt file (2.3.0rc0-gpu) solved it, ran on GPU, got predictions for the entire video, and fast as usual. I tested predicting on a model converted using the 2.4.2 version and it didnt't work even with predicting with the 2.3.0rc0 version. I had to re-convert the model using TF 2.3.0rc0, and also predict with 2.3.0rc0.
The above fix slowed prediction on GPU by a lot.
I think I found a way to solve it. I was using a tensorflow 2.4.2 docker image, changing to the tensorflow version specified in the requirements-gpu.txt file (2.3.0rc0-gpu) solved it, ran on GPU, got predictions for the entire video, and fast as usual. I tested predicting on a model converted using the 2.4.2 version and it didnt't work even with predicting with the 2.3.0rc0 version. I had to re-convert the model using TF 2.3.0rc0, and also predict with 2.3.0rc0.
Your solution works for me, thanks!!!
The above fix slowed prediction on GPU by a lot.
I think I found a way to solve it. I was using a tensorflow 2.4.2 docker image, changing to the tensorflow version specified in the requirements-gpu.txt file (2.3.0rc0-gpu) solved it, ran on GPU, got predictions for the entire video, and fast as usual. I tested predicting on a model converted using the 2.4.2 version and it didnt't work even with predicting with the 2.3.0rc0 version. I had to re-convert the model using TF 2.3.0rc0, and also predict with 2.3.0rc0.
Thanks. It's work for me :D
I faced the same problem with my RTX A5000. After weeks of debugging, what we found was, models trained on Ampere architecture cannot be run on tensorflow <= 2.4.0 after conversion and by extension on this library as it would give us the above issue. We found a workaround by training the model on colab which uses GPUs of architectures earlier than Ampere which give backward compatibility on CUDA. Then we converted the model using save_model.py using tensorflow 2.3.0 and then that model and the repo ended up working as intended on tensorflow 2.5.0 as well.
I faced the same problem with my RTX A5000. After weeks of debugging, what we found was, models trained on Ampere architecture cannot be run on tensorflow <= 2.4.0 after conversion and by extension on this library as it would give us the above issue. We found a workaround by training the model on colab which uses GPUs of architectures earlier than Ampere which give backward compatibility on CUDA. Then we converted the model using save_model.py using tensorflow 2.3.0 and then that model and the repo ended up working as intended on tensorflow 2.5.0 as well.
Thanks, it's work. Tested converting model using version tensorflow 2.3.0, then use that model to tensorflow 2.5.0
tensorflow 2.6.0 detect.py :
pred_bbox = infer(batch_data) #
pred_bbox = infer(batch_data) # this code is not work --> []
The 'serving_default' signature infer always returns pred_bbox with shape (1, 0, 84) except in the first call which returns pred_bbox with shape (1, 8, 84). It happens in detectvideo.py and evaluate.py. Tensorflow version is 2.3.0 as required with cuda10.1.