openimages / dataset

The Open Images dataset
https://storage.googleapis.com/openimages/web/index.html
Apache License 2.0
4.27k stars 603 forks source link

How were the "trainable" classes determined? #60

Closed tomfuture closed 6 years ago

tomfuture commented 6 years ago

5000 classes are designated as "trainable." How was the determination of which classes are trainable made?

rkrasin commented 6 years ago

@tomfuture I believe it's the classes which have a number of instances above the threshold in the training set. Like, 50 (but I don't claim it's the threshold; one might compute the distribution themselves)

tomfuture commented 6 years ago

Thanks @rkrasin! A friend has pointed out to me offline that it's in the readme. I don't know how I missed it!

Of these, 5000 classes are considered trainable. The trainable classes are unchanged from V2 (in V2 they were defined to have at least 30 human-verified samples in the training set and 5 in the validation or test sets).