prannaykaul / mm-ovod

Official repo for our ICML 23 paper: "Multi-Modal Classifiers for Open-Vocabulary Object Detection"
84 stars 7 forks source link

Some question about the feature dim, #5

Open mandal4 opened 9 months ago

mandal4 commented 9 months ago

I got impression with you paper, thanks. I have a question about feature dim in the proposed architecture. I see that both 'vision-based classifier' and 'text-based classifier' have dim on 512. But in many case after RoI-pooling layer(such as FasterRCNN), feature dim shows 2048 or 1024. Did you change some configuration about it or set some layer?

Thanks,