why shift the cost for negative samples before applying exponent?

Hi there, in lines 127 to 131 in your loss layer implementation you calculate the argument of the exponent in

equation 4

Dtype max_elem = *std::max_element(loss_aug_inference_.cpu_data(), loss_aug_inference_.cpu_data() + num_negatives);
caffe_add_scalar(loss_aug_inference_.count(),
    Dtype(-1.0)*max_elem, // shift
    loss_aug_inference_.mutable_cpu_data());
caffe_exp(loss_aug_inference_.count(), loss_aug_inference_.mutable_cpu_data(), loss_aug_inference_.mutable_cpu_data());
Dtype soft_maximum = log(caffe_cpu_dot(num_negatives, summer_vec_.cpu_data(), loss_aug_inference_.mutable_cpu_data()))
    + max_elem; // shift back

However, you shift it by the maximum element of codecogseqn 1 such that it is in the range [-inf, 0] when applying the exponent. Later, in line 131, you shift it back after applying the logarithm. I can not find any mentioning of this in your paper.

Could you please elaborate on this?

rksltnl / Deep-Metric-Learning-CVPR16

why shift the cost for negative samples before applying exponent? #24