tensorflow / ranking

Learning to Rank in TensorFlow
Apache License 2.0
2.74k stars 477 forks source link

TF-ranking - without query #137

Closed WhatAreTheInsights closed 4 years ago

WhatAreTheInsights commented 4 years ago

I am wondering if TF-ranking could be useful to rank teams in fantasy sports.For example participants submit their team for the week (NFL). Each team is formed of different players, and those players have a bunch of statistics (features); some features are categorical such as the team they played for/ or against that week and somes are continuous such as the number of receptions last week. Could TF-ranking be used to predict the ranking of the week according to what players have been dressed by each contestants ? All examples I found was related to some kind of search with querries. I could use different models to predict the number of points of each team and then rank them, but I think that maybe ranking could be better, because in the end the goal of ranking is exactly what I am looking for : Ranking problems are more concerned with the relative order of items(teams) than their absolute magnitudes(points)...

Moreover, is there an example/tutorial that could be useful for that kind of problem ?

Regards,

eggie5 commented 4 years ago

I don't know about your NFL example, but you can definitely use TFR for cases w/o a query.

If you use TFR w/o any queries it is essentially a recommender system. For example, I collect a dataset of (user, item, label) from our e-commerce clickstream where a user clicked or converted on item. The labels are 0, 1 and 2 for impression, click and conversion respectively.

If you pass this into TFR w/ a dot product over the user and item embedding you essentially recover the BPR family of rankers. If you replace the dot product w/ a NN you can recover the NCF and NCR family of deep rankers.

WhatAreTheInsights commented 4 years ago

@eggie5 I ma not sure to fully understand. Do you have a code example ? What do NCF and NCR stand for ? Thanks

eggie5 commented 4 years ago

NCF and NCR stand for Neural Collaborative Filtering and Raking respectively.

Create user and item embeddings:

    def context_feature_columns(user_vocab_path, K):
        user_id = categorical_column_with_vocabulary_file("uid", user_vocab_path,  num_oov_buckets=5)
        user_emb = embedding_column(user_id, K)

        return {"uid": user_emb}

    def example_feature_columns(item_vocab_path, K):
        item_id = categorical_column_with_vocabulary_file("iid", item_vocab_path, num_oov_buckets=5)
        item_emb = embedding_column(item_id, K)

        return {"iid": item_emb}

And the scoring function which is a product of the embeddings:

    def make_score_fn(self):
        """Returns a scoring function to build `EstimatorSpec`."""

        def _score_fn(context_features, group_features, mode, params, config):
            """Defines the network to score a documents. context features are
            query-level, for example query length and group_feataures are document features."""

            with tf.compat.v1.name_scope("input_layer"):
                context_input = [tf.compat.v1.layers.flatten(context_features["uid"]) ]
                group_input =   [tf.compat.v1.layers.flatten(group_features["iid"])  ]
                input_layer = tf.concat(context_input + group_input, 1)

            is_training = mode == tf.estimator.ModeKeys.TRAIN

            cur_layer = input_layer
            for i, layer_width in enumerate(int(d) for d in params.hidden_layer_dims):
                cur_layer = tf.layers.dense(cur_layer, units=layer_width)
                cur_layer = tf.layers.batch_normalization(
                    cur_layer, training=is_training
                )
                cur_layer = tf.nn.relu(cur_layer)
                tf.summary.scalar(
                    "fully_connected_{}_sparsity".format(i),
                    tf.nn.zero_fraction(cur_layer),
                )
                cur_layer = tf.layers.dropout(
                    cur_layer, rate=params.dropout_rate, training=is_training
                )

            logits = tf.layers.dense(cur_layer, units=1) 
            return logits

        return _score_fn

If your input is only users and items, ie no queries, this will learn to rank them a la collaborative filtering recommender.

youcefjd commented 4 years ago

Thanks for the clarification @eggie5 Can you please define the term 'query'? I initially thought that in the case of, say, submitting a movie title, and getting 'similar' ranked movies, is a task for a ranker and not a recommender. Since submitting a movie is a query. Or am I getting things wrong?

bendersky commented 4 years ago

@youcefjd In TF-Ranking we treat "queries" in a generic fashion and call them "context", rather than queries. As @eggie5 helpfully pointed out, context can be many things. In the context of recommendation, it will be a user, in the context of collaborative filtering it can be an item. TF-Ranking can learn the optimal ranked list of items in response to some context: user, query, other item, etc.

To your original problem, if your goal is to learn a ranked list of players for a given week, than it is possible given enough training data (player stats as features, and their performance as the labels). If your goal is to select the optimal team, than it is a harder problem, since it might involve some constrained optimization, which is not a ranking problem.

Hope this helps, Michael