simon-ging / coot-videotext

COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Apache License 2.0
288 stars 55 forks source link

inference #54

Open LilyTheBear opened 1 year ago

LilyTheBear commented 1 year ago

Hi, Do you have a sample inference code to load the model, pre-process video and text, and get the similarity score ?

Thanks !

simon-ging commented 1 year ago

Hi, no, sorry, please use the commands and instructions from the readme. Pull requests for such a code are welcome