Asking for a simple script to get text and video features

yotammarton commented 1 year ago

First of all - Amazing work on this one.

I'm a bit getting lost with the repo, may I request a simple few line script that does something like the following:

model = CLIPViP("pretrain_clipvip_base_32.pt")
text_features = model.encode_text("This is a very cute cat")
video_features = model.encode_video("vid_file.mp4")
cosine(text_features, video_features)

[Extra] Preferably I wish to get the video features for a batch of mp4 files with different lengths The closest I found is in CLIP-ViP/src/modeling/VidCLIP.py but I couldn't find a use of this script.

Thank you :)

jingli18 commented 1 year ago

Same question, I can download the videos without annotations. Where can I get the text(caption, annotation, transcription) data? Thanks a lot

HellwayXue commented 1 year ago

First of all - Amazing work on this one.

I'm a bit getting lost with the repo, may I request a simple few line script that does something like the following:
model = CLIPViP("pretrain_clipvip_base_32.pt")
text_features = model.encode_text("This is a very cute cat")
video_features = model.encode_video("vid_file.mp4")
cosine(text_features, video_features)
[Extra] Preferably I wish to get the video features for a batch of mp4 files with different lengths The closest I found is in CLIP-ViP/src/modeling/VidCLIP.py but I couldn't find a use of this script.

Thank you :)

Hi, we are intergrating CLIP-ViP into Huggingface transformers. I believe it will be more easily called. Please keep an eye on it.

HellwayXue commented 1 year ago

Same question, I can download the videos without annotations. Where can I get the text(caption, annotation, transcription) data? Thanks a lot

Hi, for ASR texts, please refer to #7 . For auxiliary captions, please download from this link: Azure Blob Link

jingli18 commented 1 year ago

Thanks a lot！

On Mon, Jul 3, 2023 at 5:24 PM HellwayXue @.***> wrote:

Same question, I can download the videos without annotations. Where can I get the text(caption, annotation, transcription) data? Thanks a lot

Hi, for ASR texts, please refer to #7 https://github.com/microsoft/XPretrain/issues/7 . For auxiliary captions, please download from this link: Azure Blob Link https://hdvila.blob.core.windows.net/dataset/hdvila_ofa_captions_db.zip?sp=r&st=2023-03-16T04:58:26Z&se=2026-03-01T12:58:26Z&spr=https&sv=2021-12-02&sr=b&sig=EYE%2Bj11VWfQ6G5dZ8CKlOOpL3ckmmNqpAtUgBy3OGDM%3D

— Reply to this email directly, view it on GitHub https://github.com/microsoft/XPretrain/issues/24#issuecomment-1617709079, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKIWUVABMN6IXKTE5DBSOT3XOKFWFANCNFSM6AAAAAAZMNCGAM . You are receiving this because you commented.Message ID: @.***>

Spark001 commented 1 year ago

Same question, I can download the videos without annotations. Where can I get the text(caption, annotation, transcription) data? Thanks a lot

Hi, for ASR texts, please refer to #7 . For auxiliary captions, please download from this link: Azure Blob Link

@HellwayXue Thanks for providing the auxiliary captions. But how to open the data.mdb files ? I tried Access and VisualStudio but they did not work...

MVPavan commented 1 year ago

First of all - Amazing work on this one. I'm a bit getting lost with the repo, may I request a simple few line script that does something like the following:
model = CLIPViP("pretrain_clipvip_base_32.pt")
text_features = model.encode_text("This is a very cute cat")
video_features = model.encode_video("vid_file.mp4")
cosine(text_features, video_features)
[Extra] Preferably I wish to get the video features for a batch of mp4 files with different lengths The closest I found is in CLIP-ViP/src/modeling/VidCLIP.py but I couldn't find a use of this script. Thank you :)
Hi, we are intergrating CLIP-ViP into Huggingface transformers. I believe it will be more easily called. Please keep an eye on it.

Hi @HellwayXue, any update on integration with HuggingFace? Thank you:)

eisneim commented 1 year ago

@MVPavan @yotammarton i'v created a simple example here: https://github.com/eisneim/clip-vip_video_search

someshfengde commented 10 months ago

Hi @MVPavan can you please suggest what configuration of GPUs are required to run this model. ( just for making inference on it )

microsoft / XPretrain

Asking for a simple script to get text and video features #24