Why the gradient of x_i is dividing dist_pos in lifted_struct_similarity_softmax_layer.cpp

rksltnl / Deep-Metric-Learning-CVPR16

Main repository for Deep Metric Learning via Lifted Structured Feature Embedding

MIT License

341 stars 122 forks source link

Why the gradient of x_i is dividing dist_pos in lifted_struct_similarity_softmax_layer.cpp #9

Closed zjchuyp closed 8 years ago

zjchuyp commented 8 years ago

I read the souce code lifted_struct_similarity_softmax_layer.cpp, in line 146: scaler = Dtype(2.0)_this_loss / dist_pos; // update x_i caffeaxpy(K, scaler * Dtype(1.0), blob_posdiff.cpu_data(), bout + iK);

I read the paper, the gradient of dJdf{xi} is dJdf{xi} =1/|p| J_{i,j} * (2 (f(xi) -f(xj)), but why dividing D_{i,j} in the code? thanks a lot!

zjchuyp commented 8 years ago

lines 63 in get_training_examples_multilabel.m have: % add self pairs pos_pairs(insert_idx : insert_idx + this_class_num_images-1, :) = ... repmat(image_ids', 1, 2); pos_class(insert_idx : insert_idx + this_class_num_images-1, :) = class_id;

we will have the same sample for xi and xj, so "dist_pos" in lifted_struct_similarity_softmax_layer.cpp will be zero!

rksltnl commented 8 years ago

Gradient: read the paper again. Implementation is correct.

Batch preparation: Even though I write the same images as a positive pair, each of them undergo the random crop operation in Caffe. Thus, their distances are non-zero.