shonenkov / CLIP-ODS

CLIP Object Detection, search object on image using natural language #Zeroshot #Unsupervised #CLIP #ODS
MIT License
137 stars 14 forks source link

Is there any paper about how this work? #7

Open JM-IP opened 2 years ago

clementw168 commented 2 years ago

You can understand most of it by reading the source code.

Basically, the V0 uses a sliding window, choose the box with the highest score and performs postprocess. The V1 gets possible masks with OpenCV functions, gets bouding boxes from these masks and then uses CLIP to get predictions to feed to a postprocessing algorithm.