sukritshankar / Caffe-LMDBCreation-MultiLabel

Creation of LMDB for training a multi-label loss in Caffe
71 stars 35 forks source link

numeric labels #1

Closed PavlosMelissinos closed 8 years ago

PavlosMelissinos commented 8 years ago

Hi,

I'd like to start by saying this repo was an oasis in the desert of using lmdb files for multilabel classification. I spent way too much time trying out other stuff before discovering your project. Thanks for that.

On to the point, I'm trying to use your code on the dtd dataset . The labels are among 47 classes (eg a texture can be blotchy and striped). I've used one-hot representation for the labels (presence or not of a class), which results in sparse vectors that look like this:

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

In create_label_lmdb.py, I multiply the vectors by 255, as per your instructions.

I use SigmoidCrossEntropyLoss as a loss function.

However, during training, the loss is stuck at ~35 no matter how long I let it run.

The point is to compare the true labels with the inference output, compute a numerical loss based on that and then do gradient descent towards lower loss. I don't believe it's the wrong approach, so what could be the reason my net doesn't learn?

I've attached my net's prototxt file. Thanks in advance

train_VGG_ILSVRC_16_layers.txt

EDIT: Nevermind, I increased the learning rate and now it works. The model overfits on the data but at least something is happening.

sukritshankar commented 8 years ago

Hi Pavlos

Thanks for using the repo. Yes altering learning rate is always helpful. However based on my past experience with training efficiently with Sigmoid Cross entropy loss, following are some pointers that might help you.

(1) Out of given M possible labels, typically convergence is best if around 25% are populated. if you have a scenario where only 3-4 out of 47 are only populated for an instance you can try splitting the net in middle layers (use caffe group operation ) into say two parts one which deals with half of the labels and the other that deals with the rest half. This makes the Optimization easier for the net.

(2) The other thing is that you can provide some pseudo label probabilities to the net for much better convergence and even improved accuracy in case you have a semi supervised scenario as proposed in our Deep Carving Paper !!

(3) Sometimes it can also help to specify a soft values - like 0.85 instead of 1 and 0.15 instead of hard zero to let the net be more flexible during training !!

Sukrit

PavlosMelissinos commented 8 years ago

Thanks for the extra details. I'm sure they'll prove useful!