ValueError: Test interactions matrix and train interactions matrix share 745082 interactions. This will cause incorrect evaluation, check your data split.
This is presumably because test_data_matrix is all my data (2.5 years of retail transactions) and training data is all the data except the last 6 months. It is basically failing because train and test have a high degree of overlap.
But why? Surely the whole point of the train_interactions argument is so you can exclude the overlap? Shouldn't this be a warning rather than an error that fails the whole function?
Hi,
When I run this code to get my test performance:
I get an error:
This is presumably because test_data_matrix is all my data (2.5 years of retail transactions) and training data is all the data except the last 6 months. It is basically failing because train and test have a high degree of overlap.
But why? Surely the whole point of the
train_interactions
argument is so you can exclude the overlap? Shouldn't this be a warning rather than an error that fails the whole function?