hwjiang1510 / VQLoC

(NeurIPS 2023) Open-set visual object query search & localization in long-form videos
20 stars 3 forks source link

Too slow inference speed. #8

Open soyeonhong opened 3 months ago

soyeonhong commented 3 months ago

Hello, thanks for the great work. In my case, I am trying to reproduce this repository. However, when I reproduce it, the inference time on the validation set takes about 3 hours using 8 A5000 GPUs. The GPU memory is almost fully utilized, and the utilization is almost 100%. So, I would like to ask a few questions:

  1. In the paper, it is mentioned that RTX 6000 GPUs were used. Could you let me know how long the inference time was with these GPUs?
  2. When it takes a long time as mentioned above, is there any particular part of the provided repo code that should be modified?
  3. In the paper, it is mentioned that 448×448 resolution clips were used, and I am using 426x320 after resizing. Besides this, is there any additional processing that should be done? (e.g., fps adjustment, etc.)

Your response would be very helpful.

hwjiang1510 commented 3 months ago

Hi,

Thanks for your interest in our work. The inference requires deploying the model on every frame of all videos. Thus, it makes some time to finish. For me, it takes about 12 hours. There is no additional preprocessing.

soyeonhong commented 3 months ago

Thank you so much for your reply! It helped me a lot!