Open hanxiao opened 6 years ago
Great article, very helpful.
Could you give an example of a reasonable value for weight?
@Jack-Paz In my previous work, I mainly use the weight to balance those few-shot/long-tail queries. You may also use something like log1p(num_clicks_queryi_on_productj)
to value more on those popular (q,d) pairs. But to me this is really task-specific and no rule of thumb exists.
Very nice and useful article.
Could you please provide an example of how to inference the model after done training?
My idea is
@Ekkalak-T Yes, you are on the correct track. To improve the inference-time performance, I would do this:
Notice step 0 is a one time thing.
To improve the efficiency of step 2, you may precode your doc-vectors using specific data structures, such as KDtree. This is another topic called *Approximate Nearest Neighbours**. Please check Facebook's faiss
@hanxiao Thank you so much for the clarification. Did you mean the metric_n is not involved in the inferencing and we can rank the documents by the only metric_p?
Regarding the inference-time performance. How can we improve the step 2 if we use 'MLP' in the metric layer?
@Ekkalak-T in the inference time there is no need to compute metric_n
. you just compute metric_p
and sort them descendingly as the final result, done! The reason is, in the training procedure the relevant query is already encouraged to get higher metric
value.
When MLP is used, there is no easy way to improve the efficiency at step 2. A special case would be MLP without nonlinear activation function, then this MLP is collapsed to a single layer perceptron, say equipped with weight W
. As a consequence tf.matmul(W, tf.concat([q_vec, d_vec]))
can be rewritten as
tf.matmul(W, tf.concat([q_vec, tf.zeros_like(d_vec)]))+ tf.matmul(W, tf.concat([tf.zeros_like(q_vec), d_vec]))
The 2nd term is independent to q_vec
thus can be computed in advanced in step 0.
The improvement is minor I guess, but still better than nothing.
@hanxiao Thanks for the brilliant idea. Could you please share us about how to choose a model in the metric layer?
I found that when using cosine or l2, the calculated model loss is not an absolute value between 0-1 so it is hard to monitor whether the model is learning.
I ended up using MLP in the metric layer and changed reduce_sum to reduce_mean in the model_loss for averaging losses over the batch. As a result, the model loss is now between 0-1.
model_loss = tf.reduce_mean(loss_q_pos)
(Let's say all query has the same weight so I removed the weight term)
Is this the correct way to monitor the loss?
https://hanxiao.github.io/2017/11/08/Optimizing-Contrastive-Rank-Triplet-Loss-in-Tensorflow-for-Neural/