tensorflow / ranking

Learning to Rank in TensorFlow
Apache License 2.0
2.74k stars 474 forks source link

How to design a ranking model for variable-length lists in tfrecode format? #311

Open ZhouM1118 opened 2 years ago

ZhouM1118 commented 2 years ago

my data like: qid feature_1 feature_2 feature_3 ... feature_i ... feature_n label 1 123 234 345 ... 56 ... 67 3 1 124 235 56 ... 55 ... 53 1 1 211 111 22 ... 23 ... 443 0 2 11 22 33 ... 44 ... 55 3 2 22 33 44 ... 55 ... 66 0

The unique identifier of the list represented by qid. For example, there are two lists above, namely 1 and 2, where 1 contains three elements, and 2 contains two elements, I hope to sort each list of different lengths. I use spark to convert the data in the above format into tfrecode format. I saw these examples:https://github.com/tensorflow/ranking/tree/master/tensorflow_ranking/examples, tf_ranking_libsvm.py is not tfrecode format. For the above data, how should I design my ranking model?

vitalyli commented 2 years ago

libsvm approach doesn't scale; Follow tfrecord example. TF record format ELWC has two parts: context with list of features and list of docs with list of features. For each rank group you need to prepare tfrecord to follow ELWC format. As long as same features are set or default. And your model input layer handles those by key, the rest is about picking proper scaling and embedding in the input layer. The contract between what keys are in ELWC and what model expects in the input layer is up to you to decide, but both sides have to agree.