microsoft / Oscar

Oscar and VinVL
MIT License
1.04k stars 251 forks source link

Pre-extracted Image Features: what OD model is used? #120

Closed ahnjaewoo closed 3 years ago

ahnjaewoo commented 3 years ago

Hi, In here, we can easily use pre-extracted image features.

And I thought these features are from VINVL OD model trained from the merged four datasets: COCO with stuff, Visual Genome, Object365 and Open Images.

However, I found that features and corresponding labels (object tags) are only from the Visual Genome dataset, which shows inferior performance than that from merged four datasets (according to VinVL paper)

So I want to clarify whether the given image features are from the pretrained X152-C4 object-attribute detection (based on only the Visual Genome dataset) or from the pretrained model on the merged four datasets.

Thanks

xiaoweihu commented 3 years ago

The OD model is firstly trained on the merged four datasets, then finetuned on VG. This gives the best performance.

ahnjaewoo commented 3 years ago

Thanks!