However, I found that features and corresponding labels (object tags) are only from the Visual Genome dataset, which shows inferior performance than that from merged four datasets (according to VinVL paper)
So I want to clarify whether the given image features are from the pretrained X152-C4 object-attribute detection (based on only the Visual Genome dataset) or from the pretrained model on the merged four datasets.
Hi, In here, we can easily use pre-extracted image features.
And I thought these features are from VINVL OD model trained from the merged four datasets: COCO with stuff, Visual Genome, Object365 and Open Images.
However, I found that features and corresponding labels (object tags) are only from the Visual Genome dataset, which shows inferior performance than that from merged four datasets (according to VinVL paper)
So I want to clarify whether the given image features are from the pretrained X152-C4 object-attribute detection (based on only the Visual Genome dataset) or from the pretrained model on the merged four datasets.
Thanks