wanghao9610 / OV-DINO

Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
https://wanghao9610.github.io/OV-DINO
Apache License 2.0
240 stars 13 forks source link

Curiosity about how to solve different labels towards same objects among different dataset. #36

Closed Cloud65000 closed 2 months ago

Cloud65000 commented 2 months ago

I'm curious about one factor of pre-trained datasets. Are there any of the following in your pre-training dataset: The label descriptions of the same target object are different in different datasets. For instance, COCO describes birds with the label "bird", but Obj365 describes birds with the label "wild bird". How do you solve this kind of problem in the pre-trained process? During the inference process, if the model "eats" a picture with birds for example, how will it predict the object birds? Will the model predict it as "bird" or "wild bird"?

wanghao9610 commented 2 months ago

In pre-training, we keep the name as the original. Actually, the ambiguity is not important for pre-training, different people may call the same thing with different names, while the two names are the same thing.

Cloud65000 commented 2 months ago

I see. I guess we can keep the name as the original because BERT will help us generate similar embeddings for those different descriptions of the same object.