tensorflow / ranking

Learning to Rank in TensorFlow
Apache License 2.0
2.74k stars 474 forks source link

Can tf-ranking handle the CTR problems with high dimensional sparse data? #8

Closed BloodD closed 5 years ago

ramakumar1729 commented 5 years ago

Yes. One of the advantages of neural networks is that they can handle high dimensional sparse features. This is done by learning a dense representation/embedding for a given sparse feature.

TF-Ranking uses Feature Columns (see tf.feature_column) to represent features. Look at this unittest for using a combination of embedding columns and categorical columns to handle sparse data. Categorical columns take in a vocabulary as an input, or alternatively can use hash buckets to create an internal vocabulary.

For very high dimensional sparse data, it is common to prune down the vocabulary to the top N frequently occurring feature values.

xuanhuiwang commented 5 years ago

tf-ranking has a loss named sigmoid_cross_entropy_loss: https://github.com/tensorflow/ranking/blob/master/tensorflow_ranking/python/losses.py#L47. If this loss is used, tf-ranking becomes standard regression. The sigmoid transformation of the PREDICT output corresponds to a CTR estimation. This is not the case for pairwise or listwise loss.