Open marianna13 opened 1 year ago
Implemented via clip-video-encode.
If you make that true the output dataset has per-frame clip similarities which you can use to filter videos. Only thing that needs to be done is add a filter in dataloader/filters.py that reads this from metadata and creates some reasonable rejection rule
Implement the feature to filter videos by the similarity scores between text and video frames CLIP embeddings. Video Filter class Colab demo.