Closed unography closed 7 years ago
You need to provide multiple targets, e.g. your y
will be [n_samples, n_classes, 2]
where for each class it's either on or off. You also may need to adjust the loss function - currently softmax_classifier
doesn't support such shape of y
. You may be able to use though sequence_classifier
for this.
I'd treat it as a bunch of binary logistic regression problems with shared features. Basically, predict a logit for every target class, and then use a cross entropy loss. Between your predicted vector of per-label predictions and the target vector. The target vector should be a vector in [0,1]^L, where it is 1 in the coordinates corresponding to positive labels. This is inefficient space-wise to store the data if the number of possible labels is far greater than the number of active labels, but it is very simple implementation-wise.
Did you manage to make it work? Would be cool, if you can PR an example.
@davidBelanger were you thinking like something below? If so, do you see any error with my implementation?
I have 9 classes, and my labels look like this:
[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1]
Meaning that this observation has class 0, 1, 3, and 8. They are in pairs - with the first value signifying not the class.
The training isn't really converging... Thanks!
def get_class_logits():
weights = tf.Variable(tf.truncated_normal([4096, 2]))
biases = tf.Variable(tf.zeros([2]))
logits = tf.matmul(tf_train_dataset, weights) + biases
return weights, biases, logits
graph = tf.Graph()
with graph.as_default():
tf_train_dataset = tf.placeholder(tf.float32, shape=(batch_size, 4096))
tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, 18))
w_0, b_0, logits_0 = get_class_logits()
w_1, b_1, logits_1 = get_class_logits()
w_2, b_2, logits_2 = get_class_logits()
w_3, b_3, logits_3 = get_class_logits()
w_4, b_4, logits_4 = get_class_logits()
w_5, b_5, logits_5 = get_class_logits()
w_6, b_6, logits_6 = get_class_logits()
w_7, b_7, logits_7 = get_class_logits()
w_8, b_8, logits_8 = get_class_logits()
all_logits = tf.concat(1, [logits_0, logits_1, logits_2, logits_3, logits_4, logits_5, logits_6,
logits_7, logits_8])
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(all_logits, tf_train_labels))
optimizer = tf.train.AdamOptimizer().minimize(loss)
Would this function
tf.nn.sigmoid_cross_entropy_with_logits()
Solve the problem?
@tfolkman @eldor4do You can try using tf.nn.sparse_softmax_cross_entropy_with_logits
for multi class.
For example of the usage see here - https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/learn/python/learn/estimators/dnn_linear_combined.py#L522
@ilblackdragon
tf.nn.sparse_softmax_cross_entropy_with_logits
The function you suggested wouldn't work for multilabel. It would work for multiclass which is a different problem.
Pretty sure tf.nn.sigmoid_cross_entropy_with_logits is what the original author is looking for.
@xksteven But, if we have 10000 output nodes? It's hard to compute all nodes, such as [1, 0, 0, 1, 0, 1, 0, 0....], the input labels is 0, 3, 5. So do you have any ideas?
@Syndrome777 this is normal. in word-level language-modeling, you predict a very high dimensional output and tensorflow handles it just fine.
@shriphani No, I mean it's a multi-label task, not language model task. My task is that at every time stamp, the model will predict 10000 output nodes, and this nodes are sparse, such as Large Scale Image Classification Tasks, every image may have dog, cat, or any others, so the ground truth is [1, 0, 0, 1, 0, 1, 1, 0, 0.......]. I don't know how to deal with this loss.
@Syndrome777 two approaches off the top of my head:
If this is still an issue - please re-file bug at tensorflow repository - this one is inactive. Thanks!
@eldor4do I also want to train a multi label dataset but I don't know how to preprocess my dataset. How do I do that?
@xksteven I think you are right, but a little problem if I do this multi-labels classify using tensorflow, which is issue, do you have any idea for this? Thank you
@ilblackdragon @davidBelanger Even I am trying to do multilabel classification, do I need to train inception model from scratch OR can I use pre trained weights like transfer learning?
Hey guys, do someone successed to output multi-label ? I have 3800+ classes and I want to output sort of probability of one element to belongs to X classes. I tried to change the softmax to sigmoid, and also the Loss function, but no way ..
@MrMimic ,you can try this command tf.nn.sigmoid_cross_entropy_with_logits(
_sentinel=None,
labels=None,
logits=None,
name=None
)
My question is not related to image classification but more towards the aggregation of losses returned by the softmax/sigmoid cross entropy functions.
The standard way of using the SoftmaxCE is as follows
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( logits=logits, labels=Y))
On the other hand, all the tutorials/questions related to multi-label classification simply recommend to replace the SoftmaxCE function with the SigmoidCE function i.e
loss_op = tf.reduce_mean(tf.nn. sigmoid_cross_entropy_with_logits( logits=logits, labels=Y))
However, the softmax_cross_entropy_with_logits
returns a A 1-D Tensor of length batch_size
whereas the sigmoid_cross_entropy_with_logits
returns A Tensor of the same shape as logits
.
Thus performing reduce_mean directly on the output of sigmoid_cross_entropy_with_logits
would further scale down the loss by Number of Classes (which is something I haven't come across in any form of loss formulation).
If what I have said above is indeed true, should one instead use the following formulation
loss_op = tf.reduce_mean(tf.reduce_sum(tf.nn. sigmoid_cross_entropy_with_logits( logits=logits, labels=Y), axis=1))
Or is it just that performing a full reduce_mean only slows down the learning (like dividing the learning rate by number of classes) and should not significantly affect the overall results?
@DushyantaDhyani I'm not sure but I'd go for the mean(sum(...))
formulation (not mean(mean(...))
).
My idea is that as we treat each logit dimension as an independent logistic regression, then we need to sum. I guess it depends on how we see the problem but the way I see it the whole model's performance is the sum of its per-class performance, it's not like each dimension was trying to accomplish the same task (in which case we'd average)
I'm trying to do the same thing and in the retrain.py I changed the following line:
final_tensor = tf.nn.softmax(logits, name=final_tensor_name)
to
final_tensor = tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=ground_truth_input, name=final_tensor_name)
But I am getting an error saying that the shapes are different. Even if I try to create a new placeholder with the same shape, I still get an error saying I need to feed a value. I am very new to TensorFlow and I am not sure what is wrong. Any help is appreciated! Thanks!
Here is a simple code to demonstrate that the sigmoid version is the right one and not the softmax version :
import numpy as np
import tensorflow as tf
if __name__ == "__main__":
tf.enable_eager_execution()
labels = np.array(20 * [1.0, 0])
logits = np.array(20 * [10.0, 0]) # perfect predictions, we expect the loss to be near zero
print(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits)) # --> loss = 59.9
print(tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits))) # --> loss = 0.35
Do I have to make changes in the multioutput file? I ideally want to train any model, like Inception, on my training data which has multi labels. How do I do that?