openimages / dataset

The Open Images dataset
https://storage.googleapis.com/openimages/web/index.html
Apache License 2.0
4.27k stars 603 forks source link

Many images have just one bounding box for a class that in reality has multiple instances on that image #65

Open lamerman opened 6 years ago

lamerman commented 6 years ago

For example this image:

download

It has multiple chairs, but only one of them has bounding box.

Is it how it was classified or was it done intentionally?

rkrasin commented 6 years ago

@samihaija Sami, can you please comment on this?

samihaija commented 6 years ago

Ivan, thank you for the mention!

The above is intentional, assuming the image came from the training partition. We only have one box per image per entity, for the training set. However, for the validation set, we tried to have all instances boxed.

lamerman commented 6 years ago

@samihaija @rkrasin thank you.

@samihaija I'm not a big expert in object detection algorithms, but my first guess would be that a neural network that is learning on this data will be penalized while training. It predicts multiple chairs, but the training data has only one and my guess would be that when it predicts YES and the training data says it NO, when in reality it's YES, the network will be penalized for such predictions.

It's much more a question than statement, as I am not sure.

What do you think, could it be a problem?

P.s. I'm trying to teach YOLO using openimages.

lamerman commented 6 years ago

I looked at the loss function of YOLO

screenshot_20180220_181624

And it seems like absence of bounding box for image when in reality it should be will affect the loss function on line 4. And it's interesting what was the reasoning behind making only one bounding box for openimages.

phdung commented 6 years ago

I am curious about this method also. @lamerman @samihaija any updates about this?

dbrazey commented 6 years ago

Does someone has an explanation concerning the fact that training data contains only one bounding box instead of all boxes ?

As a result, this dataset cannot be used to train object detection algorithm ?

dashesy commented 6 years ago

because of this, Yolo does not train well with this dataset.

ShaneYS commented 6 years ago

@dashesy hi, do you try any other detection algorithm ( like faster rcnn, ssd ) with openimages dataset? I try to train faster rcnn with mxnet with openimages, but I have many problems when preprocessing the dataset.

ShaneYS commented 6 years ago

@dashesy hi, do you try any other detection algorithm ( like faster rcnn, ssd ) with openimages dataset? I try to train faster rcnn with mxnet with openimages, but I have many problems when preprocessing the dataset.

kevinsu628 commented 5 years ago

Further, a lot of images have missing labels. Is this problem fixable?

openImage_issue_1 openImage_issue_2