tensorflow / ranking

Learning to Rank in TensorFlow
Apache License 2.0
2.74k stars 477 forks source link

Make transformed features available outside model_fn #113

Closed MiladShahidi closed 5 years ago

MiladShahidi commented 5 years ago

I have a pipeline similar to this example. More specifically, I use make_groupwise_ranking_fn to create my model_fn and I have a simple transform_fn which calls encode_listwise_features to turn the categorical input features into dense embeddings.

However, I need to implement a custom prediction routine and customize the export_keys. So as suggested in #6, I wrap the model function that I get from make_groupwise_ranking_fn in my own custom model function, which adds the extra prediction computations I need.

The problem is that this wrapper model function receives the sparse features in its features argument, but I need the dense ones. So, I tried calling the transform_fn again to turn these into dense embeddings. But this resulted in TF saying:

Variable encoding_layer/user_embedding/embedding_weights already exists. Disallowed.

And it recommends setting reuse properly for the scope. My understanding is that since the transform_fn has already been called (by the original model function) when it's called a second time, it is trying to create a TF variable that exists. (But correct me if I'm wrong). So my current workaround is to modify the call to encode_listwise_features inside my transform_fn:

with tf.variable_scope('encoding', reuse=tf.AUTO_REUSE): context_features, example_features = tfr.feature.encode_listwise_features(...) I'm wondering if there is a way to make the dense features returned by the call to transform_fn (which currently happens inside model_fn) available outside the walls of TF-Ranking, so I won't have to call the transform_fn again to get my dense features.

HongleiZhuang commented 5 years ago

Hi Milad,

You can actually call the transformed_fn to obtain the dense features before you feed them into model_fn, and only use identity transform function as the model_fn argument. See below for an example:

def redefined_model_fn(features, labels, mode, params, config):
  transform_fn = make_transform_fn()
  context_features, example_features = transform_fn(features, mode)
  transformed_features = example_features
  transformed_features.update(context_features)
  original_model_fn = tfr.model.make_groupwise_ranking_fn(
          group_score_fn=make_score_fn(),
          transform_fn=tfr.feature.make_identity_transform_fn(context_features.keys()),
          group_size=_GROUP_SIZE,
          ranking_head=ranking_head)
  estimator_spec = original_model_fn(transformed_features, labels, mode, params, config)
  return estimator_spec

And I believe you can use context_features and example_features for your purpose.

Let me know if this works for you.

xuanhuiwang commented 5 years ago

@MiladShahidi, what are the predictions that you want add to EstimatorSpec that are relying on the transformed features?

MiladShahidi commented 5 years ago

@HongleiZhuang Thank you. That worked. I think at one point I tried passing a lambda x: x (or maybe None, I can't remember) as the transform function and it obviously didn't work. Now I can see that the identity one is splitting features into context and example, which my lambda wasn't doing.

MiladShahidi commented 5 years ago

@xuanhuiwang I have an embedding column that wraps a sparse feature. In predict mode, I want to work directly with the embedding vector that corresponds to the input sparse feature. For example, when I opened this issue, I was trying to (in addition to the logits produced by the model) return k closest items to a given item. So, I needed to calculate a distance/similarity between their embedding vectors.

xuanhuiwang commented 5 years ago

@MiladShahidi, sorry for my late reply. glad to know that @HongleiZhuang's suggestion works. Let us know if there is additional need and we can figure out how to solve this inside the library.

Closing now.