microsoft / RegionCLIP

[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"
Apache License 2.0
718 stars 52 forks source link

Customized concepts embedding for ov detection #27

Closed Izzysh7 closed 2 years ago

Izzysh7 commented 2 years ago

Hi, thanks for your great reaserch! There is still something I feel confused.

I want to use this job to do ov detection on my costomized images. I follow the steps in readme, prepare concepts.txt and turn it into a .pth file. After that, I simply change the emb_path from MODEL.CLIP.TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/lvis_1203_cls_emb.pth \ MODEL.CLIP.OPENSET_TEST_TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/lvis_1203_cls_emb.pth \ into MODEL.CLIP.TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/concept_embeds.pth \ MODEL.CLIP.OPENSET_TEST_TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/concept_embeds.pth \ However, I got boxes with unexpected text.

For example, my concepts.txt include 7 objects:cat, panda, potato, bear, car, person, toy, sail But the boxes are recognized as unexpected 7 objects:almond, aerosol_cal, alligator, air_conditioner, alarm_clock, ambulance, alcohol, airplane

I saw some similar issues, and one of them mentioned that we need a json file to specify object class names. Is this true?

YiwuZhong commented 2 years ago

@Izzysh7 Yes, the problem comes from the mismatch between predicted class id and class name in visualization process.

After replacing the default concept embeddings with your custom ones, our model is already able to detect the custom objects (by matching region features to your custom embeddings). Then the only effort is to correctly visualize the class names given the predicted class id. You could look into the visualization code which was directly derived from Detectron2.