Request - Option to remove items from recommendations

budbuddy commented 6 months ago

Hello,

I'm working on a recommandation system for music events. These items have a hard expiry date (no longer relevant to recommend an event to users after it has passed) so my training set is full of items that I don't want my algorithm to recommend. At the same time I don't think I want to remove these items from my database entirely as I'm guessing they're still useful for modeling user interactions and embeddings.

At the moment I haven't found any way to filter out "expired" items from recommendations. The closest thing I have seen is the filtering out of "user_consumed" items. I can be willing to try a PR to add an attribute to the DataInfo class to be called in the Base model class and finally used in recommend_user functions. But I'm not a great programmer and not sure if it would be relevant to the project. Let me know if it sounds interesting, otherwise I'll just do it dirty for myself.

massquantity commented 6 months ago

Hi, thanks for your suggestion. If I want to implement this, I may pass the expired items as an argument into the recommend_user function and filter out them.

Another way is merging the expired items into the user_consumed items directly during recommendation, just list + list. If you are interested in this approach, I can tell you how.

budbuddy commented 6 months ago

Yes I'm very interested in hearing about the second approach, it sounds quick and easy

massquantity commented 6 months ago

First of all, there are two kinds of ids in this library, i.e. original id and inner id. I will assume the expired items you've got are represented as original ids. user_consumed is a dict with user inner id as the key and a list of item inner ids as values, so all we need to do is merge the two lists.

train_data, data_info = DatasetPure.build_trainset(...)
model.fit(...)

users = ["a", "b", "c"]
expired_items = ["i1", "i2", "i3"]
expired_inner_ids = [data_info.item2id[i] for i in expired_items if i in data_info.item2id]
recs = []
for u in users:
    u_inner_id = data_info.user2id[u]
    original_consumed = model.user_consumed[u_inner_id].copy()  # keep original consumed items
    model.user_consumed[u_inner_id] = list(set(original_consumed) | set(expired_inner_ids))  # no duplicate is allowed in consumed, so we use set
    rec = model.recommend_user(u, n_rec=10, filter_consumed=True)
    recs.append(rec)
    model.user_consumed[u_inner_id] = original_consumed  # recover original consumed items

print(recs)

budbuddy commented 6 months ago

Yep, that worked exactly as intended.

Thanks a lot for the quick reply and taking the time to write it out for me, I greatly appreciate it!

budbuddy commented 5 months ago

I just wanted to come back here and document some findings after more experimentation on this subject. The method described above does work, but it can be costly in computation time if the list of expired items is very long.

If we take an extreme example of a dataset with 60k items where only 100 items are not expired, it will be much faster to run model.predict on each (user, item) pair and rank the list of interaction like so:

recs = {}
non_expired_items = [i1, i2, i3]
for user in valid_user_ids:
        n = len(non_expired_items)
        user_list = [user] * n

        #Compute the prediction for each user/item pair
        rec = model.predict(user_list, non_expired_items)

        #sort the interactions
        sorted_pairs = sorted(zip(rec,non_expired_items), reverse=True)
        sorted_items = [pair[1] for pair in sorted_pairs]
        recs[user] = sorted_items

With these two methods, the whole question of filtering out certain items from recommendations during inference is pretty much solved. The only issue remaining in my eyes is evaluation of the model. There's no way to let the evaluate function know you want to evaluate on a filtered item list. This can be a problem if you want to measure a model's degradation over time. Let's say you have an updated test set with new items every few weeks. The evaluate function is still measuring with predictions on old items in your dataset that are irrelevant to new users, so the metrics keeps decreasing naturally, it's hard to know if the model is performing more poorly or even improving.

massquantity commented 5 months ago

This can be a problem if you want to measure a model's degradation over time. Let's say you have an updated test set with new items every few weeks. The evaluate function is still measuring with predictions on old items in your dataset that are irrelevant to new users, so the metrics keeps decreasing naturally, it's hard to know if the model is performing more poorly or even improving.

This feature is too customized, so you have to write the evaluation logic using the predict method.

massquantity / LibRecommender

Request - Option to remove items from recommendations #469