rksltnl / Deep-Metric-Learning-CVPR16

Main repository for Deep Metric Learning via Lifted Structured Feature Embedding
MIT License
341 stars 122 forks source link

why shift the cost for negative samples before applying exponent? #24

Closed patzm closed 7 years ago

patzm commented 7 years ago

Hi there, in lines 127 to 131 in your loss layer implementation you calculate the argument of the exponent in

equation 4

Dtype max_elem = *std::max_element(loss_aug_inference_.cpu_data(), loss_aug_inference_.cpu_data() + num_negatives);
caffe_add_scalar(loss_aug_inference_.count(),
    Dtype(-1.0)*max_elem, // shift
    loss_aug_inference_.mutable_cpu_data());
caffe_exp(loss_aug_inference_.count(), loss_aug_inference_.mutable_cpu_data(), loss_aug_inference_.mutable_cpu_data());
Dtype soft_maximum = log(caffe_cpu_dot(num_negatives, summer_vec_.cpu_data(), loss_aug_inference_.mutable_cpu_data()))
    + max_elem; // shift back

However, you shift it by the maximum element of codecogseqn 1 such that it is in the range [-inf, 0] when applying the exponent. Later, in line 131, you shift it back after applying the logarithm. I can not find any mentioning of this in your paper.

Could you please elaborate on this?

rksltnl commented 7 years ago

link It's for numerical stability.