yuhangzang / OV-DETR

[Under preparation] Code repo for "Open-Vocabulary DETR with Conditional Matching" (ECCV 2022)
202 stars 20 forks source link

Code for LVIS dataset #5

Open chaupham1709 opened 2 years ago

chaupham1709 commented 2 years ago

When will you release the pretrained model and setting for LVIS dataset?

yuhangzang commented 2 years ago

Hi @amateur3673 ,

I am working on preparing them. I will keep updating and won't make you wait long.

chaupham1709 commented 2 years ago

I see that your annotation file "coco_train2017_seen_2_proposal.json" contains the information of novel class. I wonder how you can generate the proposal for each class.

yuhangzang commented 2 years ago

Hi @amateur3673 ,

Like ViLD, we train a class-agnostic detector to generate the class-agnostic object proposals and use the CLIP model to predict the region that may cover the novel classes.

d12306 commented 2 years ago

hi, @yuhangzang , still not sure how you generate coco_train2017_seen_2_proposal.json, if you use class-agnostic object proposals, then there are no class-specific labels for the proposals. but in the provided json file, there are category id of these boxes.

could you elaborate how you get the class labels for these proposals?

do you use the gt boxes instead? Thanks

yuhangzang commented 2 years ago

Hi @d12306 ,

  1. For object proposals, we use the CLIP model to predict (maximum dot product score of the CLIP-image and CLIP-text features) the class id in the json file. Such a step is used to keep the object proposals that may contain the novel classes and filter out othe proposals. Will update the pre-processing code.

  2. Of course it cannot be gt boxes, which violates the open-world setting. The box coordinates are from noisy object proposals, and predicted ids are based on inaccurate CLIP predictions. You can visualize them to verify.

d12306 commented 2 years ago

@yuhangzang , thanks for the explanation! now it is much clearer!

HITerStudy commented 2 years ago

@yuhangzang hello, thanks for your fancy work! When the all of codes can be updated? Please offer the full codes which include the LVIS dataset and configs for experiments published in the paper. Thanks!

childlong commented 2 years ago

hi @yuhangzang are the annotations in coco_train2017_seen_2_proposal.json and clip_feat_coco.pkl related? For example categ=62,the number of annotations in coco_train2017_seen_2_proposal.json is 38073(image num=12774),but the number items in clip_feat_coco.pkl 32677