How the entire dataset is converted into captions

whwu95 / Cap4Video

【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

https://arxiv.org/abs/2301.00184

MIT License

225 stars 16 forks source link

How the entire dataset is converted into captions #11

Closed shams2023 closed 12 months ago

shams2023 commented 1 year ago

Thank you very much for your work! How do you convert a video from an entire dataset into captions? I currently want to convert all the images or videos in the entire dataset into captions, but the code involved in the article [ZeroCap: Zero Shot Image to Text Generation for Visual Semantic Arithmetic] only works by converting one image into captions, so I really want to know what I need to do if I want to convert an entire dataset? I really hope to receive your guidance. Thank you again

whwu95 commented 12 months ago

Please refer to https://github.com/YoadTew/zero-shot-video-to-text for generating video captions.