tensorflow / ranking

Learning to Rank in TensorFlow
Apache License 2.0
2.74k stars 475 forks source link

Siamese RankNet #21

Closed eggie5 closed 5 years ago

eggie5 commented 5 years ago

I was wondering if it's possible to implement the Siamese Ranknet using TF Ranking: http://www.eggie5.com/130-learning-to-rank-siamese-network-pairwise-data

Figure 1: Example pairwise training routine (w/ feature extraction, Scoring function (MLP), and objective)

Figure 2: Keras diagram of objective (w/o scoring function)

Pointwise Scoring function: F(x) = [f(x_1), f(x_2), ..., f(x_n)]

where f(x) is some type of scoring regression that that maps the input feature x to a scalar. f: x -> \matbb{R} . (The should be no problem w/ TF Ranking)

Pairwise Loss:

The loss uses binary cross entropy to minimize this objective:

logits = f(x_i) - f(x_j)

which is the difference between the two pairwise training pairs. The respective label y in [-1,0,1] indicates if f(x_i) or f(x_j) is larger or equal.

I see this loss exists: pairwise_logistic_loss, but I'm not sure how to use this w/ TF Ranking. How do I handle data that is in pairs? For example:

A > B B > C C > D A > C

This type of data can come from logs, like in: Joachims, Thorsten. "Optimizing search engines using clickthrough data."

ramakumar1729 commented 5 years ago

Hi Alex,

This can be implemented using TF-Ranking. In the score and sort approach, there are two main components: single-item/multi-item scoring function and list loss.

Scoring function: Defined by using make_groupwise_ranking_fn with the appropriate group_size. group_size=1 is for pointwise scoring and group size = 2 is for pairwise scoring.

List loss: The list loss is applied over scores generated using the scoring function. note that for higher group sizes, the group scores are accumulated per item to generate a list of scores, over which the loss is applied. Hence pairwise_logistic_loss can be applied for pointwise scoring as well.

Let me know if you have any other questions.

eggie5 commented 5 years ago

@ramakumar1729 Thanks for the feedback.

The input format for pairwise data isn't clear to me. For training features TF Ranking seems to expect: [batch size]x[list_size]x[feature_size] tensors, where list_size is the number of documents in a given query and feature_size is the number of features you have. For training labels TF Ranking seems to expect a [batch size]x[list_size] tensor of point wise labels.

If I have pairwise judgement data like:

A > B
B > C
C > D
A > C

instead of point wise data:

A=4
B=3
C=2
D=1

How can I load this into TF ranking?

eggie5 commented 5 years ago

Does the pairwise loss in TF ranking require pointwise labels on the documents? Equation 4 from the paper[1] seems to suggest it requires each document to be explicitly labeled.

Is it possible to use pairwise labels like in my example in my previous post?

Maybe I'll have to convert my pairwise judgements to an overall ranking and then use equation 4?

  1. Rama Kumar Pasumarthi, Xuanhui Wang, Cheng Li, Sebastian Bruch, Michael Bendersky, Marc Najork, Jan Pfeifer, Nadav Golbandi, Rohan Anil, Stephan Wolf. TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank.
eggie5 commented 5 years ago

After, some more research, it seems like the pattern developed by Joachims in "Optimizing Search Engines using Clickthrough Data" is to take the preference pairs and encode them in the full svmlight format w/ labels.

So this example

A > B
B > C
C > D
A > C

would encode to:

1 qid:1 [features] #A preference
1 qid:1 [features] #B preference
1 qid:1 [features] #C preference
2 qid:1 [features] #D
3 qid:1 [features] #E
...

Where all we know is that ABC should be ranked above the rest.

This will allow me to use TF Ranking w/ the pairwise data.

xuanhuiwang commented 5 years ago

The other way is to treat each pair as a list with size 2.

# A > B
1 qid:AB A's features
0 qid:AB B's features
# B > C
1 qid:BC B's features
0 qid:BC C's features
# C > D
1 qid:CD C's features
0 qid:CD D's features
# A > C
1 qid:AC A's features
0 qid:AC C's features

For tf-ranking, you can set list_size = 2 and group_size = 1. Use pairwise_logistic_loss and this will give you the exact the siamese network with RankNet loss.

eggie5 commented 5 years ago

@xuanhuiwang even better, thanks for the clear example!

eggie5 commented 5 years ago

@xuanhuiwang

I was able to implement something like you suggested w/ list_size = 2 and group_size = 1. What would be a good way to evaluate this using TF Ranking? Would the ranking metrics NDCG@2, MRR, have much meaning on list_size=2? Seems like a binary classifier metrics would be better like binary AUC which would quantify the probability of correctly classifying the order of the pair...

xuanhuiwang commented 5 years ago

ordered_pair_accuracy is a good one for this.

eggie5 commented 5 years ago

I've never heard of that metric, but I do see it in the metrics module and after some experimentation, it seems to pretty much be binary AUC:

labels = [[ 0.,  0.,  0.,  1., -1.],     [ 0.,  1.,  -1.,  -1., -1.], [ 1.,  -1.,  -1.,  -1., -1.]]
scores = [[ 2.,  2.8,  2.9,  3., -1., ], [ 3.,  2.,  0.,  1., -1.],   [ 3.,  4.,  -1.,  -1., -1.]]

auc = tfr.metrics.make_ranking_metric_fn(tfr.metrics.RankingMetricKey.ORDERED_PAIR_ACCURACY)

val, auc_op = auc(labels, scores, None)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    sess.run(tf.local_variables_initializer())
    print(sess.run([auc_op]))
Actual Value: [0.75]
xuanhuiwang commented 5 years ago

Yes, it is very similar to AUC and the only difference is that the pairs are formed within queries, while there is no query boundary for traditional AUC for classification. Also, TF's implementation of AUROC is not based on pairs but some kind of approximation based on bucketing scores. It is not well suited for ranking evaluation.