haltakov / natural-language-youtube-search

Search inside YouTube videos using natural language
MIT License
911 stars 72 forks source link

Strange CLIP results #7

Open smithee77 opened 3 years ago

smithee77 commented 3 years ago

Hi, first of all many thanks for this wonderful script :) I've trying some searchs, and I found some strange results. Probably is how CLIP words, but not sure.

If I search "CAR" (there are a lot of cars in the video), and if I look at the value of the frame with the best similarity, I get, e.g. 26.65 Then I search something stupid like "sdfsdflksdfj", and I check at the same value...I was expecting to get a near-to-zero value, but instead I get, e.g. 21.55.
Is this a bug? Or is the way CLIP works? Is there a way to detect how good the prediction is? Many thanks!

haltakov commented 3 years ago

I think this is how CLIP works. I've observed similar behavior - you don't really know, when CLIP doesn't know :)

I guess he reason is that CLIP was not trained for that, so it may not be easy to interpret the scores, except saying which one is higher (= better match).