Closed bhack closed 7 years ago
Hi @bhack!
Yes, we plan to release the pretrained Inception3 model. I can't promise any strict deadlines, but it should happen "soon".
It could be nice to have that for a run in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/deepdream/deepdream.ipynb
Yes, indeed. :)
The document mentions that only 6000 of the labels have been used for the model. Can the list of those 6000 labels be shared? (As in, which 6000 labels were used). Also, is training such a model even feasible ? (9 million images and over 6000 categories) Wouldn't training one from scratch and tweaking a model trained on, say ImageNet, give similar results?
The document mentions that only 6000 of the labels have been used for the model. Can the list of those 6000 labels be shared? (As in, which 6000 labels were used).
This list will be shared at the same time as the model.
Also, is training such a model even feasible ? (9 million images and over 6000 categories)
I would not speculate on what is feasible. As for the evaluation of the quality of the model we trained specifically for this release, let's wait for the model be available. Generally, the quality is not very high, as the annotations are somewhat noisy at the moment. There's a long road ahead in cleaning them up.
@gkrasin thanks! :)
The pretrained model has been released. It's decent, but not very good. There are multiple factors that contribute to that:
Also, we're now open for the pull requests. See CONTRIBUTING.md for more details.
@gkrasin Thank you for releasing this! I've used the pre-trained model of inception V3 as it ships with Tensorflow, and retrained it (Transfer Learning) to include labels that are more commonly seen in my data. All through this process, I was under the impression that the final layer calculates a softmax of the data coming in from the fully connected layer, thus resulting in a ranked output of class predictions on a 0-1 scale, all adding up to 1.
In the cat example that you provide with the trained Tensorflow model, however, there seem to be multiple synonymous/contextually related classes predicted with a high confidence, adding up to values greater than 1. I've noticed similar results on some native media as well:
Could you explain how this works, or point me to a resource that explains it? Thank you!
How the noise labeled samples could be detected with an openset approach?
Hi Aditya,
as OpenImages is a multi-label dataset (e.g. each image can have multiple labels associated with it), we don't use Softmax. Instead, the last layer has a sigmoid non-linearity (and, while training, we used the sigmoid cross entropy loss):
predictions = end_points['multi_predictions'] = tf.nn.sigmoid(
logits, name='multi_predictions')
In other words, instead of predicting a class of an image, the net predicts labels / tags, and each value is a probability that the given label is set. If an image has a cat and a mouse, the net (in the ideal case) is expected to have both labels set.
@bhack sorry, I didn't get your question. Can you please rephrase or elaborate it a bit?
I mean that we could use an openset approach to neural networks like OpenMax that linked in the previous message to try to detect noisy label samples as "unknown"
/cc @abhijitbendale
@bhack yes, using algorithmic ways to detect noise (such as OpenMax) is a good idea.
Got it. The probability assigned to each label would be independent of other classification outputs. This net should then (in the ideal case) be able to pick out multiple objects/situations (for the lack of a better word) in an image with high confidence. Would you say that's accurate?
Yes. For example, consider this image:
Ideally, the net would give at least the following labels:
animal(1.0),
prairie(1.0),
grass(1.0),
mammal(1.0),
llama(1.0),
grazing(1.0),
fauna(1.0),
vicuã±a(1.0),
guanaco(1.0),
meadow(1.0),
pasture(1.0),
grassland(1.0),
wildlife(1.0)
In the reality, since the net is not perfect (and not even calibrated, see my comment above, the outputs are:
5723: /m/0jbk - animal (score = 0.90)
2537: /m/035qhg - fauna (score = 0.85)
3473: /m/04rky - mammal (score = 0.82)
45: /m/01280g - wildlife (score = 0.79)
4605: /m/09686 - vertebrate (score = 0.74)
4558: /m/08t9c_ - grass (score = 0.32)
664: /m/01gd91 - pasture (score = 0.31)
5648: /m/0hkvx - prairie (score = 0.29)
522: /m/01c7cq - grassland (score = 0.27)
1494: /m/025st_8 - meadow (score = 0.18)
3981: /m/068hy - pet (score = 0.13)
2811: /m/03hh2k - grazing (score = 0.13)
3745: /m/05h0n - nature (score = 0.11)
...
Anyway, as you can see it has detected quite a few labels, some of which are not directly correlated (grass and mammal).
That's pretty clear now. You just saved me a whole lot of work with this. Thanks, @gkrasin!
@AdityaChaganti you're welcome. :)
Do you will release the mentioned Inception3 model pretrained?