lyst / lightfm

A Python implementation of LightFM, a hybrid recommendation algorithm.
Apache License 2.0
4.66k stars 679 forks source link

Item and User Normalization #697

Open wahabaftab opened 11 months ago

wahabaftab commented 11 months ago

I trained my LightFM model on user and item features along with interactions. I noticed some things which didnt make sense to me, So I'm hoping someone would make me understand. I am using the following code for splitting, training and evaluation:

train, test = random_train_test_split(interactions, test_percentage=0.2, random_state=np.random.RandomState(5))
train_weights, test_weights = random_train_test_split(weights, test_percentage=0.2, random_state=np.random.RandomState(5))

#model training
model.fit(train,
      user_features=user_features,
      item_features= item_features,
      sample_weight= train_weights,
      epochs=10)

# Evaluate the model on the test set using auc
auc = auc_score(model,
                      test,
                      user_features=user_features,
                      item_features=item_features,
                     ).mean()

The things I need to understand is the effect of user and item normalization on the evaluation. Following is the code :

user_features = dataset.build_user_features(User_df['features'].tolist(), normalize= True)

item_features = dataset.build_item_features(Product_df['features'].tolist(), normalize= True)

Things which are confusing:

I'd like to know if normalization can have this much effect and if above scenarios make any sense. Also additionally, AUC of 99% seems too good to be true.