Open chcomin opened 7 years ago
Actually, it seems that some of the descriptions (e.g., 'basketball') are indeed related to distinct concepts while others (e.g., 'alfa romeo giulietta') seem to describe the same thing.
Hi @chcomin,
thank you for the find. You're correct, there are two classes of errors:
Different entities, same description. Like, /m/020lf, /m/04rmv, mouse
In one case, it's a computer mouse, and in the other case, it's an animal.
For this kinds of collisions, I would propose to make the descriptions more verbose. Like "mouse" -> "computer mouse", "mouse" -> "mouse (animal)". Feel free to make a pull request, and I will try to advocate for its acceptance.
Same real entities, same description, different ids (like "alfa romeo giulietta'). A short term fix would be to modify labels so that these entities also have the same images attached. Eventually, there shall be chosen a winner, but I don't have enough information to give an informed advice here.
Checking the test images here http://openimages.oldjpg.com/, I see that sometimes the duplicates classes are actually the same (e.g. egg) but other times no. For example "mouse" as already said, but also "fish" (one is the animal and the other one is food).
Please notice that the 3 "alfa romeo giulietta" are:
So resolving all the duplicates would be a useful work, but we have to check all the classes, a simple merge could be wrong.
I agree. Let me check, if it's a good time to do with the Google team.
Hi, I noticed that some labels have the same description in the file dict.csv. Is that expected? Should these cases be treated as distinct entities or is it better to merge them into a single label?
The list of repeated descriptions is: