how to predict? - Githubissues

prakhar2811 commented 5 years ago

Urgent! I'm unable to obtain predictions using ranker.predict(test). On printing the predictions it says <generator object EstimatorV2.predict at 0x7ff0c8de5c50.

xuanhuiwang commented 5 years ago

Can you be more specific on you configuration such as feature columns, input function?

Also, issue https://github.com/tensorflow/ranking/issues/65 may be helpful.

prakhar2811 commented 5 years ago

The dataset has query id, query-hotel feature (for eg search term-hotel distance) hotel related features (for eg price, rating, average_purchase_window etc). The libsvm file consists of entries sorted by query id and within each query sorted by hotel rank i.e. as it was shown to the user. I get best results with list_mle_loss. I'm unable to obtain the predictions using ranker.predict(test_data_link) as it returns a generator. How do i save or print the results?

xuanhuiwang commented 5 years ago

Maybe this test case can help on how to call predict: https://github.com/tensorflow/ranking/blob/master/tensorflow_ranking/python/model_test.py#L276.

prakhar2811 commented 5 years ago

Do we need to give the test input as libsvm format to predict?

prakhar2811 commented 5 years ago

@xuanhuiwang sir i'm still unable to obtain prediction on the test libsvm data. Can you please help me out with creating the input_fn_predict() function?

xuanhuiwang commented 5 years ago

Do we need to give the test input as libsvm format to predict?

I don't think so. It works with the Tensors directly and should be the same format as the features for your model_fn.

Sorry that I don't have a handy examples. Did you look at issue #65? You can export your model to accept tf.Example as input.

prakhar2811 commented 5 years ago

@xuanhuiwang how should i define my own I serving input function? My training and evaluation are working fine and i can see increment in NDCG@k as well. Now i need to get my prediction for unseen data in order to deploy it. Can you help me out with it?

xuanhuiwang commented 5 years ago

@prakhar2811, we have recently updated our examples. It has the function that exports a model with tf.Example as input here. You can also find the definition of the receiver function.

You can deploy your model to TensorFlow serving and feed an tf.Example with the same feature_spec that you used when training the model.

prakhar2811 commented 5 years ago

@xuanhuiwang thanks a lot. Should I input test data as libsvm as we did in training? Also, can we predict instead of just evaluate? i.e. can you make the corresponding changes in tf_ranking_libsvm.ipynb for prediction?

xuanhuiwang commented 5 years ago

In the current tf_ranking_libsvm.py, the export model accepts tf.train.Example as the input format, i.e., wrapping the information in original libsvm into tf.train.Example with feature names being the same as the one specified in feature_columns.

We will consider adding example of using ranker.predict(...) in our future releases.

prakhar2811 commented 5 years ago

def input_fn_pred(path):
  train_dataset = tf.data.Dataset.from_generator(
      tfr.data.libsvm_generator(path, _NUM_FEATURES, _LIST_SIZE),
      output_types=(
          {str(k): tf.float32 for k in range(1,_NUM_FEATURES+1)},
          tf.float32
      ),
      output_shapes=(
          {str(k): tf.TensorShape([_LIST_SIZE, 1])
            for k in range(1,_NUM_FEATURES+1)},
          tf.TensorShape([_LIST_SIZE])
      )
  )
  train_dataset = train_dataset.batch(_BATCH_SIZE)
  return train_dataset.make_one_shot_iterator().get_next()

I'm using print(next(ranker.predict(input_fn=lambda: input_fn_pred(_TEST_DATA_PATH))) to print my predictions. I'm now successfully able to obtain the predictions on my test data. As I keep on running the print(next()) statement I get the predictions for my next query. But once all my queries are exhausted the print(next()) gets back to the 1st query and gives same prediction as before just that they get shuffled among themselves. Why are the predictions getting shuffled among themselves within each query?

carrot321 commented 5 years ago

Hi, not certain about the protocol here so correct me if this is not the right place. I found this thread after searching for help on predicting. I've been trying to find a way to relate the prediction output to the data that I input. However, the examples shown for inputting libsvm data to these functions make use of the libsvm_generator. The issue with using this for prediction (and relating it to the input documents) is that the libsvm_generator randomly shuffles the input documents, meaning that the order of the predictions is not directly relatable to the order of input documents. So I'm wondering what the correct way is to the input data generation? This also relates to the previous comment which is using the libsvm_generator for the test data generation.

prakhar2811 commented 5 years ago

@carrot321 thanks a lot for pointing this out. I've now created a new function libsvm_generator_test for converting test data to libsvm without reshuffling. Now successfully able to obtain prediction.

carrot321 commented 5 years ago

@prakhar2811 I've been struggling with how to do this correctly (also since my use case has variable list sizes). Could you share your libsvm_generator_test?

prakhar2811 commented 5 years ago

@carrot321 even my use case has variable list sizes, that wont be an issue as the model itself pads or trims each list according to the list_size that we need to define. I defined 2 new functions libsvm_generator_test and _libsvm_generate_test in my data.py file. _libsvm_generate_test is defined same as _libsvm_generate, just that I've removed https://github.com/tensorflow/ranking/blob/c962b61dcf842aeb0144cb4dee289f0df162005b/tensorflow_ranking/python/data.py#L973 (np.random.shuffle(doc_list)) from _libsvm_generate_test. And libsvm_generator_test is defined same as libsvm_generator just that it calls _libsvm_generate_test in place of _libsvm_generate. Then you can use this function to convert your test libsvm to generator.

def input_fn_pred(path):
  train_dataset = tf.data.Dataset.from_generator(
      tfr.data.libsvm_generator_test(path, _NUM_FEATURES, _LIST_SIZE),
      output_types=(
          {str(k): tf.float32 for k in range(1,_NUM_FEATURES+1)},
          tf.float32
      ),
      output_shapes=(
          {str(k): tf.TensorShape([_LIST_SIZE, 1])
            for k in range(1,_NUM_FEATURES+1)},
          tf.TensorShape([_LIST_SIZE])
      )
  )
  train_dataset = train_dataset.batch(_BATCH_SIZE)
  return train_dataset.make_one_shot_iterator().get_next()

Predictions are then obtained by;

r=ranker.predict(input_fn=lambda: input_fn_pred(_TEST_DATA_PATH))
print(next(r))

As you go on running print(next(r)) you get predictions for your subsequent query. Hope you get it. Also, let me know if your problem is solved so that I could cross check. Thanks!

carrot321 commented 5 years ago

@prakhar2811 Hi, I've been thinking along these lines as well, however, by removing the shuffling, the document list trimming (I believe in the lines right after the shuffle) is no longer random (it simply cuts off the last 3 documents you input to the generator if you happen to have 12 when the list size is 9). Because of this, the order of the inputted documents starts to matter, which we do not want. So I'm starting to be a bit confused about this. Looking at comments in the repository I found the following: list_size: (int) The number of examples to keep per ranking instance. If specified, truncation or padding may happen. Otherwise, the output Tensors have a dynamic list size. which makes me think it should be possible to not specify a list size (which is further confirmed to me by the model_test.py script). The original paper on group wise scoring functions using DNNs also seems to match this. So wondering if fixing this in the generator will be as simple as removing the trimming and padding from the generator?

xuanhuiwang commented 5 years ago

@prakhar2811, thanks for providing the example that make the ranker.predict working. It looks to be the right way for calling predict.

@carrot321, the list_size can vary from batch to batch for tf-ranking models. This is the latest enhancement of the library.

We have also enhanced our library with the ExampleInExample data format that relies on tf.train.Example. Se some example here. We will provide code examples soon.

Sorry that we haven't update the libsvm generator. You can certainly try to remove the trimming and padding. However, please make sure that you only have a single query when calling predict.

tensorflow / ranking

how to predict? #69