Regarding using the OFA model to generate captions in the middle of the video, can you introduce in detail which OFA you use and how you speed up this process?
Hi, we use "caption_large_best_clean". For speedup, we just run OFA on multi-GPUs and gather all results. Also, you can first extract and save the middle frame, as video loading and decoding may be time-consuming.
Regarding using the OFA model to generate captions in the middle of the video, can you introduce in detail which OFA you use and how you speed up this process?