Closed patzm closed 7 years ago
Hi there, in lines 127 to 131 in your loss layer implementation you calculate the argument of the exponent in
Dtype max_elem = *std::max_element(loss_aug_inference_.cpu_data(), loss_aug_inference_.cpu_data() + num_negatives); caffe_add_scalar(loss_aug_inference_.count(), Dtype(-1.0)*max_elem, // shift loss_aug_inference_.mutable_cpu_data()); caffe_exp(loss_aug_inference_.count(), loss_aug_inference_.mutable_cpu_data(), loss_aug_inference_.mutable_cpu_data()); Dtype soft_maximum = log(caffe_cpu_dot(num_negatives, summer_vec_.cpu_data(), loss_aug_inference_.mutable_cpu_data())) + max_elem; // shift back
However, you shift it by the maximum element of such that it is in the range [-inf, 0] when applying the exponent. Later, in line 131, you shift it back after applying the logarithm. I can not find any mentioning of this in your paper.
Could you please elaborate on this?
link It's for numerical stability.
Hi there, in lines 127 to 131 in your loss layer implementation you calculate the argument of the exponent in
However, you shift it by the maximum element of such that it is in the range [-inf, 0] when applying the exponent. Later, in line 131, you shift it back after applying the logarithm. I can not find any mentioning of this in your paper.
Could you please elaborate on this?