microsoft / VideoX

VideoX: a collection of video cross-modal models
Other
967 stars 160 forks source link

Asking for a simple script to get text and video features #102

Open ymartin-mw opened 1 year ago

ymartin-mw commented 1 year ago

First of all - Amazing work on this one.

I'm a bit getting lost with the repo, may I request a simple few line script that does something like the following:

model = CLIPViP("pretrain_clipvip_base_32.pt")
text_features = model.encode_text("This is a very cute cat")
video_features = model.encode_video("vid_file.mp4")
cosine(text_features, video_features)

[Extra] Preferably I wish to get the video features for a batch of mp4 files with different lengths

Thank you

tomer196 commented 10 months ago

Anyone can help with this kind of script?