Closed lioutasb closed 5 years ago
Thanks for asking! weight-decay
is applied here, though maybe in a slightly complicated manner. It's applied to all learnable parameters of the network, including box convolutions' ones; the paper doesn't say that only boxes are regularized.
@shrubb Thank you for your quick response. I did a search with weight-decay
instead of weight_decay
that's why I couldn't find it lol.
I assumed you use regularization only on the box convolution since you didn't mention it on the "Performance" subsection. My bad.
Thank you again for this very interesting work.
On the paper you mention that you use L2 regularization on the box convolution parameters to shrink the box dimensions towards zero.
Where exactly on the code you do this regularization because I can't find it. You have
weight-decay
flag here but it's not used on the optimizer. Also, if you use this on the optimizer the regularization is going to be applied on the whole network but you specifically mention that you apply regularization on the box convolution parameters only.