Pre-extracted Image Features: what OD model is used?

Hi, In here, we can easily use pre-extracted image features.

And I thought these features are from VINVL OD model trained from the merged four datasets: COCO with stuff, Visual Genome, Object365 and Open Images.

However, I found that features and corresponding labels (object tags) are only from the Visual Genome dataset, which shows inferior performance than that from merged four datasets (according to VinVL paper)

So I want to clarify whether the given image features are from the pretrained X152-C4 object-attribute detection (based on only the Visual Genome dataset) or from the pretrained model on the merged four datasets.

Thanks

microsoft / scene_graph_benchmark

Pre-extracted Image Features: what OD model is used? #36