lyst / lightfm

A Python implementation of LightFM, a hybrid recommendation algorithm.
Apache License 2.0
4.73k stars 691 forks source link

It is unclear why evaluation metrics (auc_score, precision_at_k) fail when train and test data have high overlap #658

Open Richie-Peak opened 2 years ago

Richie-Peak commented 2 years ago

Hi,

When I run this code to get my test performance:

test_auc = auc_score(model, test_interactions = test_data_matrix, train_interactions = train_data_matrix).mean()

I get an error:

ValueError: Test interactions matrix and train interactions matrix share 745082 interactions. This will cause incorrect evaluation, check your data split.

This is presumably because test_data_matrix is all my data (2.5 years of retail transactions) and training data is all the data except the last 6 months. It is basically failing because train and test have a high degree of overlap.

But why? Surely the whole point of the train_interactions argument is so you can exclude the overlap? Shouldn't this be a warning rather than an error that fails the whole function?