Open seusofthd opened 4 years ago
And for implementing the Sparse version, it should be equivalent to changing the y_true to one hot labels and then directly apply the CircleLoss class's call, is that correct?
for image classification:
I use code as following ensure y_pred as cos similarity
.
kl.Lambda(lambda x: tf.nn.l2_normalize(x, 1), name='emmbeding'),
kl.Dense(10, use_bias=False, kernel_constraint=k.constraints.unit_norm())
yse, But if k > 1, SparseCircleLoss is actually equivalent to PairCircleLoss
yes.
Thanks for your quick reply!
for 2 if K >1 I still have some questions. Basically it is a sum of exp(-gamma...s_p...). This means that you cannot take the sum of positive exps as the denominator. So you cannot just use -r_sp_m * self.gamma + logZ as the denominator inside the log. And imagine that we can have two pairs of positive, then r_sp_m's dimension would be [batch_size, 2], which is different fro logZ(dimension is [batch_size, 1])
And for training the cifar, how did you schedule the learning rate, you just use the default lr in Adam? I am trying to implement this for resnet50 training on imagenet, seems it is very sensitive to learning rate.
I think that the optimization goal set by circle loss
cannot be achieved, so his loss gradient will be quite large. At the same time, the ap
and an
is very large due to the large gap during the first few training cycles.
But overall the circle loss
is still relatively robust, I suggest that the pre-training is best to reduce the learning rate
by 10 times
For all CircleLoss implementation, there is a comment on 'y_pred must be cos similarity'. I am a little bit confused. For image classification, it should also accepts logits as y_pred. Is that correct?
Another question I have is for both CircleLoss and SparseCircleLoss, the calculation is correct only if there is one inner class pair(K = 1 for s_p). Is that correct?