[Feature Request] Create demo for using dense, sparse and embedded features.

eggie5 commented 5 years ago

Would it be possible to learn a ranker from pairwise data where the features are latent factors (w/o any hand made features)? Like a matrix factorization model?

So the input to the pairwise loss is the respective embeddings for two documents you are ranking...

ramakumar1729 commented 5 years ago

Yes, it is possible. transform_fn would be an appropriate place to convert your features to latent vectors. Whether you jointly train such representations with the ranker depends on your problem.

eggie5 commented 5 years ago

Yes, I was thinking about jointly training a document embedding. I have pairwise labels (A > B, etc). For each labeled pair (A,B), I'll lookup their embeddings (A_emb, B_emb) and use that as the document features. This would replace classical LTR query-document features (I don't have any queries in my context anyways). Not sure what you mean w/ the transform_fn but I'll research the code a bit.

eggie5 commented 5 years ago

Here's my example (modified from tf ranking example) of using an embedding to learn a latent factor model:

from tensorflow.feature_column import categorical_column_with_identity, embedding_column

_BUCKETS=10000
_K=10

def make_score_fn(buckets):
  """Returns a scoring function to build `EstimatorSpec`."""

  def _score_fn(context_features, group_features, mode, params, config):
    """Defines the network to score a documents."""
    del params
    del config

    item_id = categorical_column_with_identity(key='item_id', num_buckets=_BUCKETS, default_value=0)
    item_emb = embedding_column(item_id, _K)

    input_layer = tf.feature_column.input_layer(group_features, item_emb)

    cur_layer = input_layer
    for i, layer_width in enumerate(int(d) for d in _HIDDEN_LAYER_DIMS):
      cur_layer = tf.layers.dense(cur_layer, units=layer_width, activation="relu")

    logits = tf.layers.dense(cur_layer, units=1) #regression
    return logits

  return _score_fn

I modified my input function to only return an item_id which is an integer number that maps to an embedding. But it could be concerted w/ any other arbitrary features into fixed length vector for the FC layers.

This gets good results on my ranking task for MRR:

baseline: .54 LF model: .76

ramakumar1729 commented 5 years ago

Thanks for sharing this example, Alex. This looks great. If you wish, you could define the feature columns outside, so that you can also use them to make the parsing_spec to read in tf.Examples or tf.SequenceExamples.

eggie5 commented 5 years ago

When you say "define your feature columns outside", you mean like in the notebook example, where there is a example_feature_columns function which is called from the _score_fn function?

Also, I don't understand what the transform_fn function is for. Can you provide an example?

darlliu commented 5 years ago

Hello, I would just like to chime in that having an example of using feature columns where group_size and feature dimension is not 1 would be helpful. I can use a groupwise feature tensor directly with dimension [?, group_size (10), feature_dimension (180)] but when I put it in a numeric feature column with shape [group_size, feature_dimension] I get the following error on this line in _groupwise_dnn_v2:

      scores = _score_fn(
          large_batch_context_features, large_batch_group_features, reuse=False)

error is:

ValueError: Dimensions must be equal, but are 1800 and 180 for 'groupwise_dnn_v2/group_score/rescale/mul' (op: 'Mul') with input shapes: [?,1800], [180].

I think it has something to do with the shape to the feature column but I'm unsure what's the issue here.

sjhermanek commented 5 years ago

FWIW I ran into the same issue as @darlliu

ramakumar1729 commented 5 years ago

Please check out the demo on using sparse features and embeddings in TF-Ranking. You can click on the colab link to start executing the content of the notebook.

Tensorboard is integrated into the notebook, and you can use it to visualize the eval and loss curves.

Feel free to post your feedback by responding on this issue.

eggie5 commented 5 years ago

Thanks for posting some concrete examples in this new notebook. Some questions:

I see a new data format EIE whereas the previous examples seemed to use SequenceExample. What are some of the motivations on moving to the new format?
Also what is the intuition behind this?: _NUM_TRAIN_STEPS = 15 * 1000
"The transform function takes in the raw dense or sparse features from the input reader, applies suitable transformations to return dense representations for each feature" I never understood what the transform function where for until reading this line. I always just made my features in the scoring function.
I noticed you moved away from tf.contrib.layers.optimize_loss. It was a nice abstraction, however it was probably replaced b/c contrib is being depreciated in tf 2.0?

eggie5 commented 5 years ago

And also, I think it would be nice to have the hyper params passed to the transform_fn just like the group_score_fn so you can do something like this:

def example_feature_columns(params):

    rest_id = categorical_column_with_identity(key='rid', num_buckets=item_buckets)
    rest_emb = embedding_column(rest_id, params.K)

    return {"rid": rest_emb}

def make_transform_fn():
    def _transform_fn(features, mode, params):
        """Defines transform_fn."""
        example_name = next(six.iterkeys(example_feature_columns(params)))
        input_size = tf.shape(input=features[example_name])[1]
        context_features, example_features = tfr.feature.encode_listwise_features(
            features=features,
            input_size=input_size,
            context_feature_columns=context_feature_columns(),
            example_feature_columns=example_feature_columns(),
            mode=mode,
            scope="transform_layer")

        return context_features, example_features
    return _transform_fn

See how I added params to the signature of _transform_fn so now it can accept hyper params like the group_score_fn?

The use case for this is the common one where the embedding dimensions are a hyper parameter.

Would you accept a PR from me for this?

ramakumar1729 commented 5 years ago

Hi Alex, great set of questions. Please find my replies inline.

Thanks for posting some concrete examples in this new notebook. Some questions:

I see a new data format EIE whereas the previous examples seemed to use SequenceExample. What are some of the motivations on moving to the new format? The primary motivation for EIE is that it has the per-query and per-document (which we call context and example features) in self contained tf.Examples. SequenceExample represents in a feature major format, and EIE represents in a document major format. Does this make sense?

Also what is the intuition behind this?: _NUM_TRAIN_STEPS = 15 * 1000 The number of training steps are kept low for the sake of demonstration. This should show the curves in Tensorboard in around 15 mins. You can assign this to any number you feel appropriate.

"The transform function takes in the raw dense or sparse features from the input reader, applies suitable transformations to return dense representations for each feature" I never understood what the transform function where for until reading this line. I always just made my features in the scoring function. The transform function applies the transformation for all the documents only once, while the logic in group score function is applied over each group. Think of the transform function as a "pre-precocessing" before sending to scoring function.

I noticed you moved away from tf.contrib.layers.optimize_loss. It was a nice abstraction, however it was probably replaced b/c contrib is being depreciated in tf 2.0? Yes, that is exactly the reason. Our repository is TF2.0 alpha compatible.

ramakumar1729 commented 5 years ago

I like this PR suggestion. Please go ahead with this. One thing to keep in mind is that you will need to change the model_fn builder, which expects only (features, mode) argument. See this line for more details.

And also, I think it would be nice to have the hyper params passed to the transform_fn just like the group_score_fn so you can do something like this:
def example_feature_columns(params):

    rest_id = categorical_column_with_identity(key='rid', num_buckets=item_buckets)
    rest_emb = embedding_column(rest_id, params.K)

    return {"rid": rest_emb}

def make_transform_fn():
    def _transform_fn(features, mode, params):
        """Defines transform_fn."""
        example_name = next(six.iterkeys(example_feature_columns(params)))
        input_size = tf.shape(input=features[example_name])[1]
        context_features, example_features = tfr.feature.encode_listwise_features(
            features=features,
            input_size=input_size,
            context_feature_columns=context_feature_columns(),
            example_feature_columns=example_feature_columns(),
            mode=mode,
            scope="transform_layer")

        return context_features, example_features
    return _transform_fn
See how I added params to the signature of _transform_fn so now it can accept hyper params like the group_score_fn?

The use case for this is the common one where the embedding dimensions are a hyper parameter.

Would you accept a PR from me for this?

tensorflow / ranking

[Feature Request] Create demo for using dense, sparse and embedded features. #31