recommenders-team / recommenders

Best Practices on Recommendation Systems
https://recommenders-team.github.io/recommenders/intro.html
MIT License
18.89k stars 3.07k forks source link

[BUG] xDeepFM output file length less than test file length #961

Closed ghost closed 4 years ago

ghost commented 4 years ago

Description

After fitting an xDeepFM model using CIN and DNN, I call model.predict() passing in the test file in FFM format with 12788 rows of test examples in the FFM format. When I examine the output file created I see that there are 12689 rows of predictions.

I've used different train, validate, test splits and find that I keep seeing fewer rows of predictions than test examples.

The way the model is implemented, the output file rows do not have information such as userID, itemID. If this were the case, at least I could see which examples did not have a prediction returned for them.

What might be the cause for this discrepancy? Is there any way to write out the userID, itemID associated with the prediction?

In which platform does it happen?

Windows 10 running Anaconda. Currently using the master branch from recommenders

How do we replicate the issue?

Expected behavior (i.e. solution)

The number of rows in the output file should match the number of rows in the test file.

Ideally, the output file should have the userID, productID pair and the prediction, or some way of indexing the prediction back to the example presented. It would be nice if this was returned from the model.predict call as a unified dataframe rather than writing outputs

Other Comments

miguelgfierro commented 4 years ago

FYI @yueguoguo @Leavingseason any idea for this?

ghost commented 4 years ago

train: batch_size: 128

total examples 255762 split is 0.9, 0.05, 0.05 train shape is (230186, 5) validate shape is (12788, 5) test shape is (12788, 5)

rating  userID  itemID  feat1   feat2

227840 4 1:1:2249 2:2:65178 3:3:1 4:112:1 204339 3 1:1:1847 2:2:62676 3:4:1 4:253:1 73644 4 1:1:1689 2:2:44864 3:3:1 4:141:1 68849 3 1:1:2650 2:2:44697 3:5:1 4:142:1 176296 5 1:1:2653 2:2:58389 3:3:1 4:387:1

fields 4 features 486

[('DNN_FIELD_NUM', None), ('FEATURE_COUNT', 486), ('FIELD_COUNT', 4), ('MODEL_DIR', './outputs'), ('PAIR_NUM', None), ('SUMMARIES_DIR', './outputs'), ('activation', ['relu', 'relu']), ('attention_activation', None), ('attention_dropout', 0.0), ('attention_layer_sizes', None), ('batch_size', 128), ('cross_activation', 'identity'), ('cross_l1', 0.0), ('cross_l2', 0.0001), ('cross_layer_sizes', [100, 100, 50]), ('cross_layers', None), ('data_format', 'ffm'), ('dim', 10), ('doc_size', None), ('dropout', [0.0, 0.0]), ('dtype', 32), ('embed_l1', 0.0), ('embed_l2', 0.0001), ('enable_BN', False), ('entityEmb_file', None), ('entity_dim', None), ('entity_embedding_method', None), ('entity_size', None), ('epochs', 7), ('fast_CIN_d', 0), ('filter_sizes', None), ('init_method', 'tnormal'), ('init_value', 0.01), ('is_clip_norm', 0), ('iterator_type', None), ('kg_file', None), ('kg_training_interval', 5), ('layer_l1', 0.0), ('layer_l2', 0.0), ('layer_sizes', [100, 100]), ('learning_rate', 0.001), ('load_model_name', None), ('load_saved_model', False), ('loss', 'square_loss'), ('lr_kg', 0.5), ('lr_rs', 1), ('max_grad_norm', 2), ('method', 'regression'), ('metrics', ['rmse']), ('model_type', 'xDeepFM'), ('mu', None), ('n_item', None), ('n_item_attr', None), ('n_user', None), ('n_user_attr', None), ('num_filters', None), ('optimizer', 'adam'), ('reg_kg', 0.0), ('save_epoch', 2), ('save_model', True), ('show_step', 20), ('train_ratio', None), ('transform', None), ('use_CIN_part', True), ('use_DNN_part', True), ('use_FM_part', False), ('use_Linear_part', False), ('user_clicks', None), ('user_dropout', False), ('wordEmb_file', None), ('word_size', None), ('write_tfevents', True)]

It uses CIN and DNN only

Output file has 12689 rows

ghost commented 4 years ago

Running the same setup with batch_size = 256 gives me an output file with 12739 predictions versus the test file which has 12788 examples.

ghost commented 4 years ago

I went back to batch_size 128 and made my test file a multiple of 128, so the number of examples is now 12544, (128 * 98 = 12544). Ran it again and this time my output file has 12447 rows, a difference of 97 predictions.

ghost commented 4 years ago

Noticed that https://github.com/microsoft/recommenders/blob/master/tests/smoke/test_deeprec_model.py, the smoke test for deeprec does not check to see if the number of predictions returned matches the number of examples presented.

ghost commented 4 years ago

Capturing this link here in case it is relevant: https://stackoverflow.com/questions/48551158/keras-predict-generator-not-returning-correct-number-of-samples

ghost commented 4 years ago

image

@miguelgfierro , @yueguoguo and @Leavingseason I carefully examined the output file. It looks like there are some lines that have more than one prediction.

ghost commented 4 years ago

image image

The bug manifests itself exactly at my batch_size boundary which is 128. I'm going to instrument base_model.py in BaseModel's predict() to see how many times it puts a blank line

ghost commented 4 years ago

Is BaseModel predict() expecting each input file to be exactly batch_size number of examples?

ghost commented 4 years ago

@miguelgfierro, @yueguoguo , @Leavingseason - I think this change fixes the problem.

image

After making this change, my output files do not have more than one prediction per line and the number of predictions ties out with the number of examples. I'd like to create a pull request for this fix.

yueguoguo commented 4 years ago

@atimesastudios sorry for not replying in time but you are right. We have spotted the bug before in #782 and fixed it in #833. Are you using the new version of the repo utils?

ghost commented 4 years ago

I will check to make sure I have the latest version.

ghost commented 4 years ago

@yueguoguo - I just looked at the codebase on the recommender master branch. It still seems to have the bug in the codebase.

    load_sess = self.sess
    with tf.gfile.GFile(outfile_name, "w") as wt:
        for batch_data_input in self.iterator.load_data_from_file(infile_name):
            step_pred = self.infer(load_sess, batch_data_input)
            step_pred = np.reshape(step_pred, -1)
            wt.write("\n".join(map(str, step_pred)))

        # line break after each batch.
        wt.write("\n")
    return self
ghost commented 4 years ago

@yueguoguo - It appears that the fix made in #833 was overwritten by the very next commit ("small fix"). Would it be possible to re-apply the fix into the latest codebase?

yueguoguo commented 4 years ago

Thanks @atimesastudios

Also thanks for @elogicsal for creating PR to fix the issue.