recommenders-team / recommenders

Best Practices on Recommendation Systems
https://recommenders-team.github.io/recommenders/intro.html
MIT License
18.47k stars 3.04k forks source link

Simplify top-k evaluation in notebooks #505

Open gramhagen opened 5 years ago

gramhagen commented 5 years ago

What is affected by this bug?

Many notebooks (surprise deep dive, ncf deep dive, vw deep dive) follow a similar patter for top-k evaluation that results in duplicate code and extra effort: 1) generate predictions on all user/item pairs 2) remove pairs seen during training 3) run through all the evaluation metrics

example:

for user in train.userID.unique():
    for item in train.itemID.unique():
        preds_lst.append([user, item, svd.predict(user, item).est])

all_predictions = pd.DataFrame(data=preds_lst, columns=["userID", "itemID", "prediction"])

merged = pd.merge(train, all_predictions, on=["userID", "itemID"], how="outer")
all_predictions = merged[merged.rating.isnull()].drop('rating', axis=1)

eval_map = map_at_k(test, all_predictions, col_prediction='prediction', k=k)
eval_ndcg = ndcg_at_k(test, all_predictions, col_prediction='prediction', k=k)
eval_precision = precision_at_k(test, all_predictions, col_prediction='prediction', k=k)
eval_recall = recall_at_k(test, all_predictions, col_prediction='prediction', k=k)

Expected behavior (i.e. solution)

It should be straightforward to implement utilities in the python_evaluation module to automate most or all of this, so that users can call something like:

all_data = get_user_item_pairs(train_data, remove_seen=True)
all_data['prediction'] = all_data.apply(lambda x: model.predict(x['userId'], x['itemId']))
metrics = RankingMetrics(test, all_data)
...
miguelgfierro commented 5 years ago

there has been a recent discussion about this.

apart from the functions mentioned by @gramhagen, @anargyri mentioned that we have a function in the evaluators https://github.com/Microsoft/Recommenders/blob/995b0789d449c6d485e76fe01e387e4148b281e4/reco_utils/evaluation/python_evaluation.py#L594.

There is an efficient implementation by @maxkazmsft: "Go to branch "sar_experimental" and reset back to commit "7138bbd161ef4cafc2082938cca10fc6aafad322", notebook "reco_utils/recommender/sar/sar_pyspark.py" has the code at the end, commit log message is "SAR: implemented more efficient top-k calculation."

If we refactor this, we could potentially use also knn for getting the top k, which will be much more efficient and will be good for large datasets (we could use annoy library)

anargyri commented 5 years ago

Selecting the top-k is done currently inside the ranking evaluation. So the SAR top-k computation is redundant. Either you remove it from SAR or you remove the top-k from the evaluator. However, the initial rationale was that the user may want to apply the evaluator to results from another algorithm from outside this repo. So, if you remove top-k computation from the evaluator you will need to provide it as a util to the user. Another thing to be careful about is how to perform all_data = get_user_item_pairs(train_data, remove_seen=True). This generates the complete users - items matrix, which takes up a lot of memory with larger data sets. The approach used by Surprise avoids duplicating this matrix in memory (even temporarily), and it is good to follow this practice.

gramhagen commented 5 years ago

Is this the approach you're referring to? https://github.com/NicolasHug/Surprise/blob/master/examples/top_n_recommendations.py

anargyri commented 5 years ago

Yes, also what you quoted above for user in train.userID.unique(): for item in train.itemID.unique(): preds_lst.append([user, item, svd.predict(user, item).est])

If you created a DF with all users - items first and then applied svd.predict() then you would duplicate the users - items matrix, because svd.predict() returns the users and items along with the prediction.