s9xie / DSN

Deeply-supervised Nets
http://vcl.ucsd.edu/~sxie/2014/09/12/dsn-project/
Other
141 stars 66 forks source link

On weights of the loss layers #1

Open happynear opened 9 years ago

happynear commented 9 years ago

I noticed that the weight of the loss layers are realised by setting the blob_lr parameter of the innerproduct layer in this project. This is equivalent to the formulation (3) for training the innerproduct layer. However, the gradients backpropagate to the bottom conv layer will not be influenced by the weight (0.001 in the prototxt file). In another word, this realization just slowly learns the innerproduct layer of the previous SVMs, but applies the gradients of the classifiers equally to the nets. All SVMs have the same weight this way. CAFFE has provided a param called "loss_weight", which is the correct method to realize the model described in the paper as far as I see. This is all my opinion. If I were wrong, please reply me.

s9xie commented 9 years ago

Yes exactly the case. Back to the time I did the experiments caffe did not support per layer weight, you can find several lengthy discussions in caffe's repo. We should use per layer weight instead of the per layer lr finally, but please note that you can always achieve a certain configuration by tuning the per layer loss, so the correctness of the experiments will not be influenced here.

happynear commented 9 years ago

@s9xie, I have done some experiments and the two strategies' performance is quite close to your declared ones. However, by tuning the lr, the previous SVMs apply gradients fully with weight 1, but is learned much slower with weght 0.001. The lr and the loss weight are quite different things. I am just astonished by these results. Nonetheless, the idea of adding SVMs to the conv layers is really impressive and works well.