Open jemmott opened 4 years ago
You might find something useful in these issues:
@jemmott did you manage to solve this problem?
I too am struggling with drastic performance degradation (in P@K) when using item features, can't say the mentioned issues helped me much, but that is likely because I'm new to the whole recommendation system world.
No, no real progress.
I have tasked some students with doing a comparison of LightFM against some other baselines as a class project, but I am not sure if any of them are using user or item features. If so, I will update.
I also did some user interviews, and based on the results we will be using LightFM, though not with user or item features.
tackled the same problem.. if someone finds a good answer please share
Has anyone actually seen an improvement on real data using it? i.e is the lightFM implementation possibly broken entirely?
The implementation isn't broken.
It is, however, very simple: the model simply averages the embeddings of all the features it is given. Because of the averaging, the model is incapable of figuring out which features are uninformative and ignoring them.
Consequently, if you add lots of uninformative features they will degrade your model by diluting the information provided by your good features. To prevent this, you may have to adopt more sophisticated models whose implementations are not offered by LightFM.
Note also that metadata features are likely to improve performance only on very sparse datasets, or sparse (long tail, cold-start) subsets of your data.
Did you add the identity matrix to you features matrix? At least I missed that in the beginning and got worse performance when including features.
Yes, I tested both with and without the identity matrix. Adding the identity matrix helped, but still gave worse performance than no features at all.
Based on the feedback from maciejkula above, I don't think I am seeing the problem where I am adding a ton of uninformative features - in the goodreads example I only included the author as a feature, which I expect would be a very strong signal. So it must be the last line - that performance is only improved in very sparse data.
For what it's worth, we ended up with a hybrid architecture, where LightFM does the CF part, and there is also a feature-based recommender, and the results are combined. We are also exploring the TensorFlow recommender library, which also has the ability to include CF + features (and more).
I too was having performance issues when I added features to the model. The model performed better for users who had (in my case) fewer than 10 interactions in the training set, but performed poorly as the number of interactions increased.
What helped was giving a weight to each feature. The weights were obtained by training a random forest (using sklearn) on the data and outputting the model.featureimportances. Also, discretising numerical features into bins achieved better results compared with simply using the value as the feature weight. This approach also allows you to include the feature importance as the feature weight, as per the above.
We are also exploring the TensorFlow recommender library
Hello @jemmott ! Have you already tested It? How It performs?
Thanks!
I too was having performance issues when I added features to the model. The model performed better for users who had (in my case) fewer than 10 interactions in the training set, but performed poorly as the number of interactions increased.
What helped was giving a weight to each feature. The weights were obtained by training a random forest (using sklearn) on the data and outputting the model.featureimportances. Also, discretising numerical features into bins achieved better results compared with simply using the value as the feature weight. This approach also allows you to include the feature importance as the feature weight, as per the above.
@Furnec what was the dependent variable for your RandomForest model?
@Furnec what was the dependent variable for your RandomForest model?
@shivamtundele The RandomForest model predicted whether or not a given user-item pair was ‘positive’, using logistic regression. From memory, I think we omitted user IDs and item IDs from the input data as we only wanted the relative importance of the features. I suspect we also down-weighted / downsampled negative interactions. In hindsight, using SHAP values probably would’ve been better than using Sklearn’s featureimportances.
This approach is by no means perfect, but it worked sufficiently well for us.
First, it is totally possible that I am misunderstanding something basic or have a bug in my code.
But I am consistently finding that adding item features actually reduces performance compared with collaborative filtering.
I first did the analysis on some internal data, but reproduced it with a public example to share here. Here is a notebook with an example on goodreads data: https://github.com/jemmott/lightfm-goodbooks-debug
The punch line is I looked an implicit example - trying to predict if a user will rate a book. I used mean reciprocal rank (MRR) as the metric, but results were similar for R@K and P@K. Performance is significantly reduced when I include authors as an item feature when compared with no item features (pure collaborative filtering). I did not explore user features.
On a hunch I decided to test something kind of strange. I decided to shuffle the item features - permuting them so that they are randomly assigned to each item. I then trained and cross validated LightFM, and found the change in MRR. I repeated that 100 times, and drew a histogram of the results, shown in blue below. The x axis is percent change from CF. The red line is the result with the actual item assignments.
What this tells me is that not only are the item features actually reducing the performance, but the actual (non shuffled) item features are on average no better at predicting than randomly shuffled ones. This seems really bad.
I also tried an example where I included the item ids as item features to add the identity matrix back in. Performance was still worse than pure CF (no features), but did improve slightly.
It seems like I am not alone with this - here are two other examples of people seeing worse performance when adding item features:
Anyone know what is going on here?