tensorflow / ranking

Learning to Rank in TensorFlow
Apache License 2.0
2.74k stars 477 forks source link

Can predictions scores have negative values? #62

Closed Dinesh-Mali closed 5 years ago

Dinesh-Mali commented 5 years ago

I have my input_fn for evaluation as follows:

def input_fn_prediction(path): test_dataset = tf.data.Dataset.from_generator( tfr.data.libsvm_generator(path, _NUM_FEATURES, _LIST_SIZE), output_types=( {str(k): tf.float32 for k in range(1,_NUM_FEATURES+1)}, tf.float32 ), output_shapes=( {str(k): tf.TensorShape([_LIST_SIZE, 1]) for k in range(1,_NUM_FEATURES+1)}, tf.TensorShape([_LIST_SIZE]) ) )

test_dataset = test_dataset.batch(_BATCH_SIZE)

return test_dataset.make_one_shot_iterator().get_next()

During prediction I used same input function. When I run ranker.predict cell multiple times it gives reordered results and sometimes the scores are negative,If this is the case How can I rank my documents correctly? Can scores have negative values?

Please Help! Thank you.

ramakumar1729 commented 5 years ago

The scores are based on the output of the scoring neural network logic (make_group_score_fn), and can have negative values. The negative values can still be used to order the documents, and this is exactly how the listwise loss operates. If you prefer having non-negative values, you can apply a sigmoid transformation to the scores, this transformation does not impact the order of scores.

Dinesh-Mali commented 5 years ago

Thanks Rama, I have one more question..

I have trained and evaluated my model with shuffle on during training but removed shuffle at the time of evaluation..Here is my code

for training, Input_fn is as follows, def input_fn(path): train_dataset = tf.data.Dataset.from_generator( tfr.data.libsvm_generator(path, _NUM_FEATURES, _LIST_SIZE), output_types=( {str(k): tf.float32 for k in range(1,_NUM_FEATURES+1)}, tf.float32 ), output_shapes=( {str(k): tf.TensorShape([_LIST_SIZE, 1]) for k in range(1,_NUM_FEATURES+1)}, tf.TensorShape([_LIST_SIZE]) ) ) np.random.seed(25) train_dataset = train_dataset.shuffle(1000).batch(_BATCH_SIZE)

return train_dataset.make_one_shot_iterator().get_next()

For evaluation and prediction I used following input_fn,

def input_fn_prediction(path): test_dataset = tf.data.Dataset.from_generator( tfr.data.libsvm_generator(path, _NUM_FEATURES, _LIST_SIZE), output_types=( {str(k): tf.float32 for k in range(1,_NUM_FEATURES+1)}, tf.float32 ), output_shapes=( {str(k): tf.TensorShape([_LIST_SIZE, 1]) for k in range(1,_NUM_FEATURES+1)}, tf.TensorShape([_LIST_SIZE]) ) )

test_dataset = test_dataset.batch(_BATCH_SIZE)

return test_dataset.make_one_shot_iterator().get_next()

Now lets say I have a query with qid 1 and 20 documents (all are different and only one document is relevant for that query)

My inference data will be as follows, ? qid:1 x:x1 y:y1 z:z1.... ? qid:1 ... ? qid:1 ... upto 20 documents.

My problem is , after training the model, When I performs prediction on query with qid:2 followed by query with qid:1 it is giving same score for qid:1 and qid:2 but only order is different. I am not able to understand this scenario, Please guide, Thank You!

ramakumar1729 commented 5 years ago

When I performs prediction on query with qid:2 followed by query with qid:1 it is giving same score for qid:1 and qid:2 but only order is different.

I don't understand this clearly. Can you elaborate? How do the datapoints for qid:1 and qid:2 look like?

Dinesh-Mali commented 5 years ago

Lets say I have inference data point as follows for query qid:1 ,

? qid:1 x:x1, y:y1, z:z1 ? qid:1 x = x2, y:y2, z:z2 ... ... upto 20 document

Same is for query qid:2

when I performs prediction with query qid:1, I will get 20 scores corresponding to 20 documents, lets say predicted scores are scores_qid1 = [s1, s2, s3,...,s20]

again, when I performs prediction on query qid:2, I gets the same result as score_qid1 (same values) but only difference is the order i.e., score_qid2 = [s3,s2,s7,...,s20]

Though I have two different queries for inference, the predicted scores remains same with different orders.

ramakumar1729 commented 5 years ago

This can happen if both the queries have the same set of documents/document features. Can you check if this is the case?

Dinesh-Mali commented 5 years ago

Yes, I have same set of documents for all the queries i.e. 20 documents for each query. among the features I am using only BM25 score is different but most of the features are only slightly changing for each query.

Dinesh-Mali commented 5 years ago

When I trains my model with a particular dataset then each time for evaluation it is giving different ndcg scores, The losses are not stable and variation is large. What is the reason?

bendersky commented 5 years ago

TF-Ranking and neural models, in general, use stochastic gradient descent, the results are not guaranteed to be identical each time you run your model. If the dataset is large enough, the variance should be relatively small, however for very small datasets the variance would be large. When evaluating performance, it might make sense to average across multiple runs and report standard deviation as well.

Closing this issue for now, as the original question was addressed.