microsoft / XPretrain

Multi-modality pre-training
Other
467 stars 36 forks source link

CLIP-VIP OFA caption generate #22

Closed tikboaHIT closed 1 year ago

tikboaHIT commented 1 year ago

Regarding using the OFA model to generate captions in the middle of the video, can you introduce in detail which OFA you use and how you speed up this process?

HellwayXue commented 1 year ago

Hi, we use "caption_large_best_clean". For speedup, we just run OFA on multi-GPUs and gather all results. Also, you can first extract and save the middle frame, as video loading and decoding may be time-consuming.