openimages / dataset

The Open Images dataset
https://storage.googleapis.com/openimages/web/index.html
Apache License 2.0
4.23k stars 605 forks source link

Inception3 pretrained #3

Closed bhack closed 7 years ago

bhack commented 7 years ago

Do you will release the mentioned Inception3 model pretrained?

gkrasin commented 7 years ago

Hi @bhack!

Yes, we plan to release the pretrained Inception3 model. I can't promise any strict deadlines, but it should happen "soon".

bhack commented 7 years ago

It could be nice to have that for a run in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/deepdream/deepdream.ipynb

gkrasin commented 7 years ago

Yes, indeed. :)

iamgroot42 commented 7 years ago

The document mentions that only 6000 of the labels have been used for the model. Can the list of those 6000 labels be shared? (As in, which 6000 labels were used). Also, is training such a model even feasible ? (9 million images and over 6000 categories) Wouldn't training one from scratch and tweaking a model trained on, say ImageNet, give similar results?

gkrasin commented 7 years ago

The document mentions that only 6000 of the labels have been used for the model. Can the list of those 6000 labels be shared? (As in, which 6000 labels were used).

This list will be shared at the same time as the model.

Also, is training such a model even feasible ? (9 million images and over 6000 categories)

I would not speculate on what is feasible. As for the evaluation of the quality of the model we trained specifically for this release, let's wait for the model be available. Generally, the quality is not very high, as the annotations are somewhat noisy at the moment. There's a long road ahead in cleaning them up.

iamgroot42 commented 7 years ago

@gkrasin thanks! :)

gkrasin commented 7 years ago

The pretrained model has been released. It's decent, but not very good. There are multiple factors that contribute to that:

  1. Annotations in the training set are noisy. This should get better over time.
  2. The training procedure was pretty basic. Like, randomly initialize Inception v3, define losses, start training, stop training after a couple of weeks.
  3. While the model has learned even rare labels, the absolute output from them might be very low (< 0.01). At the same time, the outputs are semantically ordered in the sense that an image with the label raises higher score than an image without label. Therefore, it's possible to calibrate the released model by stretching the outputs per label. This is not done at the moment.

Also, we're now open for the pull requests. See CONTRIBUTING.md for more details.

AdityaChaganti commented 7 years ago

@gkrasin Thank you for releasing this! I've used the pre-trained model of inception V3 as it ships with Tensorflow, and retrained it (Transfer Learning) to include labels that are more commonly seen in my data. All through this process, I was under the impression that the final layer calculates a softmax of the data coming in from the fully connected layer, thus resulting in a ranked output of class predictions on a 0-1 scale, all adding up to 1.

In the cat example that you provide with the trained Tensorflow model, however, there seem to be multiple synonymous/contextually related classes predicted with a high confidence, adding up to values greater than 1. I've noticed similar results on some native media as well:

screenshot 2016-11-09 17 26 10

Could you explain how this works, or point me to a resource that explains it? Thank you!

bhack commented 7 years ago

How the noise labeled samples could be detected with an openset approach?

gkrasin commented 7 years ago

Hi Aditya,

as OpenImages is a multi-label dataset (e.g. each image can have multiple labels associated with it), we don't use Softmax. Instead, the last layer has a sigmoid non-linearity (and, while training, we used the sigmoid cross entropy loss):

predictions = end_points['multi_predictions'] = tf.nn.sigmoid(
        logits, name='multi_predictions')

In other words, instead of predicting a class of an image, the net predicts labels / tags, and each value is a probability that the given label is set. If an image has a cat and a mouse, the net (in the ideal case) is expected to have both labels set.

gkrasin commented 7 years ago

@bhack sorry, I didn't get your question. Can you please rephrase or elaborate it a bit?

bhack commented 7 years ago

I mean that we could use an openset approach to neural networks like OpenMax that linked in the previous message to try to detect noisy label samples as "unknown"

bhack commented 7 years ago

/cc @abhijitbendale

gkrasin commented 7 years ago

@bhack yes, using algorithmic ways to detect noise (such as OpenMax) is a good idea.

AdityaChaganti commented 7 years ago

Got it. The probability assigned to each label would be independent of other classification outputs. This net should then (in the ideal case) be able to pick out multiple objects/situations (for the lack of a better word) in an image with high confidence. Would you say that's accurate?

gkrasin commented 7 years ago

Yes. For example, consider this image:

llama

Ideally, the net would give at least the following labels:

animal(1.0),
prairie(1.0),
grass(1.0),
mammal(1.0),
llama(1.0),
grazing(1.0),
fauna(1.0),
vicuã±a(1.0),
guanaco(1.0),
meadow(1.0),
pasture(1.0),
grassland(1.0),
wildlife(1.0)

In the reality, since the net is not perfect (and not even calibrated, see my comment above, the outputs are:

5723: /m/0jbk - animal (score = 0.90)
2537: /m/035qhg - fauna (score = 0.85)
3473: /m/04rky - mammal (score = 0.82)
45: /m/01280g - wildlife (score = 0.79)
4605: /m/09686 - vertebrate (score = 0.74)
4558: /m/08t9c_ - grass (score = 0.32)
664: /m/01gd91 - pasture (score = 0.31)
5648: /m/0hkvx - prairie (score = 0.29)
522: /m/01c7cq - grassland (score = 0.27)
1494: /m/025st_8 - meadow (score = 0.18)
3981: /m/068hy - pet (score = 0.13)
2811: /m/03hh2k - grazing (score = 0.13)
3745: /m/05h0n - nature (score = 0.11)
...

Anyway, as you can see it has detected quite a few labels, some of which are not directly correlated (grass and mammal).

AdityaChaganti commented 7 years ago

That's pretty clear now. You just saved me a whole lot of work with this. Thanks, @gkrasin!

gkrasin commented 7 years ago

@AdityaChaganti you're welcome. :)