How do I do multilabel image classification?

unography commented 8 years ago

Do I have to make changes in the multioutput file? I ideally want to train any model, like Inception, on my training data which has multi labels. How do I do that?

ilblackdragon commented 8 years ago

You need to provide multiple targets, e.g. your y will be [n_samples, n_classes, 2] where for each class it's either on or off. You also may need to adjust the loss function - currently softmax_classifier doesn't support such shape of y. You may be able to use though sequence_classifier for this.

davidBelanger commented 8 years ago

I'd treat it as a bunch of binary logistic regression problems with shared features. Basically, predict a logit for every target class, and then use a cross entropy loss. Between your predicted vector of per-label predictions and the target vector. The target vector should be a vector in [0,1]^L, where it is 1 in the coordinates corresponding to positive labels. This is inefficient space-wise to store the data if the number of possible labels is far greater than the number of active labels, but it is very simple implementation-wise.

ilblackdragon commented 8 years ago

Did you manage to make it work? Would be cool, if you can PR an example.

tfolkman commented 8 years ago

@davidBelanger were you thinking like something below? If so, do you see any error with my implementation?

I have 9 classes, and my labels look like this:

[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1]

Meaning that this observation has class 0, 1, 3, and 8. They are in pairs - with the first value signifying not the class.

The training isn't really converging... Thanks!

def get_class_logits():
    weights = tf.Variable(tf.truncated_normal([4096, 2]))
    biases = tf.Variable(tf.zeros([2]))
    logits = tf.matmul(tf_train_dataset, weights) + biases
    return weights, biases, logits

graph = tf.Graph()
with graph.as_default():

    tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, 4096))
    tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, 18))

    w_0, b_0, logits_0 = get_class_logits()
    w_1, b_1, logits_1 = get_class_logits()
    w_2, b_2, logits_2 = get_class_logits()
    w_3, b_3, logits_3 = get_class_logits()
    w_4, b_4, logits_4 = get_class_logits()
    w_5, b_5, logits_5 = get_class_logits()
    w_6, b_6, logits_6 = get_class_logits()
    w_7, b_7, logits_7 = get_class_logits()
    w_8, b_8, logits_8 = get_class_logits()

    all_logits = tf.concat(1, [logits_0, logits_1, logits_2, logits_3, logits_4, logits_5, logits_6,
                               logits_7, logits_8])

    loss = tf.reduce_mean(
    tf.nn.softmax_cross_entropy_with_logits(all_logits, tf_train_labels))

    optimizer = tf.train.AdamOptimizer().minimize(loss)

xksteven commented 8 years ago

Would this function

tf.nn.sigmoid_cross_entropy_with_logits()

Solve the problem?

ilblackdragon commented 8 years ago

@tfolkman @eldor4do You can try using tf.nn.sparse_softmax_cross_entropy_with_logits for multi class.

For example of the usage see here - https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py#L522

xksteven commented 8 years ago

@ilblackdragon

tf.nn.sparse_softmax_cross_entropy_with_logits

The function you suggested wouldn't work for multilabel. It would work for multiclass which is a different problem.

Pretty sure tf.nn.sigmoid_cross_entropy_with_logits is what the original author is looking for.

Syndrome777 commented 8 years ago

@xksteven But, if we have 10000 output nodes? It's hard to compute all nodes, such as [1, 0, 0, 1, 0, 1, 0, 0....], the input labels is 0, 3, 5. So do you have any ideas?

shriphani commented 8 years ago

@Syndrome777 this is normal. in word-level language-modeling, you predict a very high dimensional output and tensorflow handles it just fine.

Syndrome777 commented 8 years ago

@shriphani No, I mean it's a multi-label task, not language model task. My task is that at every time stamp, the model will predict 10000 output nodes, and this nodes are sparse, such as Large Scale Image Classification Tasks, every image may have dog, cat, or any others, so the ground truth is [1, 0, 0, 1, 0, 1, 1, 0, 0.......]. I don't know how to deal with this loss.

shriphani commented 8 years ago

@Syndrome777 two approaches off the top of my head:

Set a uniform prior on all possible labels a given X can have. So if 5 entries in your 10000 dim vector are active, then you emit a distribution with all 0s and the 5 desired entries set to 0.2 . You can use the categorical cross-entropy loss then.
Treat it as 10000 binary classification problems. You can still use the categorical cross-entropy loss and (maybe) add all the losses. If you do add all the losses, the cumulative loss is still nicely differentiable (although possibly hard to interpret - no confusion matrix and all that machinery).

ilblackdragon commented 7 years ago

If this is still an issue - please re-file bug at tensorflow repository - this one is inactive. Thanks!

s2244521 commented 7 years ago

@eldor4do I also want to train a multi label dataset but I don't know how to preprocess my dataset. How do I do that?

wenfeixiang1991 commented 6 years ago

@xksteven I think you are right, but a little problem if I do this multi-labels classify using tensorflow, which is issue, do you have any idea for this? Thank you

ravikrn commented 6 years ago

@ilblackdragon @davidBelanger Even I am trying to do multilabel classification, do I need to train inception model from scratch OR can I use pre trained weights like transfer learning?

MrMimic commented 6 years ago

Hey guys, do someone successed to output multi-label ? I have 3800+ classes and I want to output sort of probability of one element to belongs to X classes. I tried to change the softmax to sigmoid, and also the Loss function, but no way ..

s2244521 commented 6 years ago

@MrMimic ,you can try this command tf.nn.sigmoid_cross_entropy_with_logits( _sentinel=None,
labels=None, logits=None, name=None )

DushyantaDhyani commented 6 years ago

My question is not related to image classification but more towards the aggregation of losses returned by the softmax/sigmoid cross entropy functions.

The standard way of using the SoftmaxCE is as follows

loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( logits=logits, labels=Y))

On the other hand, all the tutorials/questions related to multi-label classification simply recommend to replace the SoftmaxCE function with the SigmoidCE function i.e

loss_op = tf.reduce_mean(tf.nn. sigmoid_cross_entropy_with_logits( logits=logits, labels=Y))

However, the softmax_cross_entropy_with_logits returns a A 1-D Tensor of length batch_size whereas the sigmoid_cross_entropy_with_logits returns A Tensor of the same shape as logits.

Thus performing reduce_mean directly on the output of sigmoid_cross_entropy_with_logits would further scale down the loss by Number of Classes (which is something I haven't come across in any form of loss formulation).

If what I have said above is indeed true, should one instead use the following formulation

loss_op = tf.reduce_mean(tf.reduce_sum(tf.nn. sigmoid_cross_entropy_with_logits( logits=logits, labels=Y), axis=1))

Or is it just that performing a full reduce_mean only slows down the learning (like dividing the learning rate by number of classes) and should not significantly affect the overall results?

vict0rsch commented 6 years ago

@DushyantaDhyani I'm not sure but I'd go for the mean(sum(...)) formulation (not mean(mean(...))).

My idea is that as we treat each logit dimension as an independent logistic regression, then we need to sum. I guess it depends on how we see the problem but the way I see it the whole model's performance is the sum of its per-class performance, it's not like each dimension was trying to accomplish the same task (in which case we'd average)

psonthalia commented 5 years ago

I'm trying to do the same thing and in the retrain.py I changed the following line: final_tensor = tf.nn.softmax(logits, name=final_tensor_name) to final_tensor = tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=ground_truth_input, name=final_tensor_name) But I am getting an error saying that the shapes are different. Even if I try to create a new placeholder with the same shape, I still get an error saying I need to feed a value. I am very new to TensorFlow and I am not sure what is wrong. Any help is appreciated! Thanks!

ismael-elatifi commented 5 years ago

Here is a simple code to demonstrate that the sigmoid version is the right one and not the softmax version :

import numpy as np
import tensorflow as tf

if __name__ == "__main__":
    tf.enable_eager_execution()

    labels = np.array(20 * [1.0, 0])
    logits = np.array(20 * [10.0, 0])  # perfect predictions, we expect the loss to be near zero
    print(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits))                  # --> loss = 59.9
    print(tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits)))  # --> loss = 0.35

tensorflow / skflow

How do I do multilabel image classification? #113