How to zero-shot inference my own label class instead of COCO or LVIS

microsoft / RegionCLIP

[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"

Apache License 2.0

712 stars 52 forks source link

How to zero-shot inference my own label class instead of COCO or LVIS #66

Closed QHCV closed 1 year ago

QHCV commented 1 year ago

Very good work, I would like to know if it is possible to implement zero-shot inference own label class, if so how should I do it, can you specify it? Thank you.

whhong5 commented 1 year ago

Hello! I also want to try the zero-shot function on my own label class. I try to get the text_embedding and region feature respectively. And then calculate the similarity. But I got some troubles and didn't succeed so far. I am not sure this procedure could work or not. Would you like to share your method concerning this question if you have already solved it? Many thanks in advanced.

QHCV commented 1 year ago

Hello! I also want to try the zero-shot function on my own label class. I try to get the text_embedding and region feature respectively. And then calculate the similarity. But I got some troubles and didn't succeed so far. I am not sure this procedure could work or not. Would you like to share your method concerning this question if you have already solved it? Many thanks in advanced.

I used the Extract Concept Features example in the readme to get the text embedding of the label class, and then used the example in Zero-shot Inference to infer my own dataset, without using Extract Region Features in the process.

Xuefei98 commented 1 year ago

I am trying to do the same thing by using the customized concept_embeds.pth and change the NUM_CLASSES to 3 in the config file, but it gives an error says " File "RegionCLIP/detectron2/modeling/roi_heads/fast_rcnn.py", line 456, in init self.clsscore.weight.copy(pre_computed_w) RuntimeError: The size of tensor a (1203) must match the size of tensor b (4) at non-singleton dimension 0". I wonder if you had the same problem. Any help is appreciated! Thank you!