lyst / lightfm

A Python implementation of LightFM, a hybrid recommendation algorithm.
Apache License 2.0
4.66k stars 679 forks source link

Add new user/item ids or features #667

Open PaulSteffen-betclic opened 1 year ago

PaulSteffen-betclic commented 1 year ago

Hello,

Thanks for this nice work, very efficient and designed for real wrold use cases.

Concerning the cold-start issue, you indicate in the documentation to call _fitpartial method of the lightfm.data.Dataset class, and to "resize your LightFM model to be able to use the new features". What does "resize your LightFM model to be able to use the new features" really means ?

First I train the model

from lightfm import LightFM
from lightfm.data import Dataset
from lightfm.evaluation import auc_score, precision_at_k, recall_at_k, reciprocal_rank

dataset = Dataset(user_identity_features=False, item_identity_features=True)
dataset.fit(users=train_users_df.index.unique(), 
            items=train_items_df.index, 
            item_features=train_tag_labels)

train_item_features = dataset.build_item_features(train_item_features_)
train_interactions, train_weights = dataset.build_interactions(train_users_df["MatchId"].items())

recommender = LightFM(loss='warp')

recommender = recommender.fit(interactions=train_interactions,
                              item_features=train_item_features,
                              sample_weight=train_weights,
                              epochs=NUM_EPOCHS,
                              num_threads=NUM_THREADS)

After, I would like to predict using this model but for new items, with new features, unseen during this 1st fit. I fit again partially my Dataset without issue ...

dataset.fit_partial(users=test_users_df.index.unique(), 
                    items=test_items_df.index, 
                    item_features=test_tag_labels)

test_item_features_ = build_item_features(test_items_df)
test_item_features = dataset.build_item_features(test_item_features_)

... but I get an error when predicting with the model

recommender.predict_rank(test_interactions=next_items,
                         train_interactions=past_items,
                         item_features=test_item_features)

and I get the following error: "ValueError: The item feature matrix specifies more features than there are estimated feature embeddings: 3623 vs 4985" where 3623 is the number of item features saw during the fit and 4985 is the number of features after adding new items with new features.

Then, is there a way to "resize the model" as suggested in the documentation ?

Thanks

marcosvliras commented 1 year ago

@PaulSteffen-betclic You might edit(add new features) your test_item_features manually before you call the predict method.

csr_matrix and vstack from scipy.sparse helps you to handle with this sparse matrix