Open ZhouM1118 opened 2 years ago
libsvm approach doesn't scale; Follow tfrecord example. TF record format ELWC has two parts: context with list of features and list of docs with list of features. For each rank group you need to prepare tfrecord to follow ELWC format. As long as same features are set or default. And your model input layer handles those by key, the rest is about picking proper scaling and embedding in the input layer. The contract between what keys are in ELWC and what model expects in the input layer is up to you to decide, but both sides have to agree.
my data like: qid feature_1 feature_2 feature_3 ... feature_i ... feature_n label 1 123 234 345 ... 56 ... 67 3 1 124 235 56 ... 55 ... 53 1 1 211 111 22 ... 23 ... 443 0 2 11 22 33 ... 44 ... 55 3 2 22 33 44 ... 55 ... 66 0
The unique identifier of the list represented by qid. For example, there are two lists above, namely 1 and 2, where 1 contains three elements, and 2 contains two elements, I hope to sort each list of different lengths. I use spark to convert the data in the above format into tfrecode format. I saw these examples:https://github.com/tensorflow/ranking/tree/master/tensorflow_ranking/examples, tf_ranking_libsvm.py is not tfrecode format. For the above data, how should I design my ranking model?