lyst / lightfm

A Python implementation of LightFM, a hybrid recommendation algorithm.
Apache License 2.0
4.73k stars 691 forks source link

TypeError: unhashable type: 'list' #575

Closed joy5075 closed 3 years ago

joy5075 commented 3 years ago

Hello, I'm using the wine data to make wine recommendation system.

But, feature which use for item meta has multiple values. For example, grapes feature's value is [Shiraz/Syrah, Grenache, Mourvedre], taste's value is [citrus, lemon, lime, lemon zest, grapefruit] however, when I make item_feature, error(TypeError: unhashable type: 'list') occurs.

# train dataset
train_item_meta = train[['wine_id', 'taste', 'grapes', 'food']].drop_duplicates(['wine_id']).reset_index(drop=True)
train_item_feature_source = [(train_item_meta.iloc[i,0], train_item_meta.iloc[i,1:]) for i in train_item_meta.index]

train_user_meta = user_cluster_train[['userID', 'user_type']].drop_duplicates(['userID']).reset_index(drop=True)
train_user_meta = train_user_meta.fillna(0)
train_user_feature_source = [(train_user_meta.iloc[i,0], train_user_meta.iloc[i,1:].values) for i in train_user_meta.index]

trainset = Dataset()
trainset.fit(users=train.userID.unique(), items=train.wine_id.unique(), item_features=train_item_token, user_features=train_user_meta[train_user_meta.columns[1:]].values.flatten())
train_item_features = trainset.build_item_features(train_item_feature_source)  # TypeError: unhashable type: 'list'
train_user_features = trainset.build_user_features(train_user_feature_source)

# test dataset
test_item_meta = test[['wine_id', 'taste', 'grapes', 'food']].drop_duplicates(['wine_id']).reset_index(drop=True)
test_item_feature_source = [(test_item_meta.iloc[i,0], test_item_meta.iloc[i,1:]) for i in test_item_meta.index]

test_user_meta = user_cluster_test[['userID', 'user_type']].drop_duplicates(['userID']).reset_index(drop=True)
test_user_meta = test_user_meta.fillna(test_user_meta.mean())
test_user_feature_source = [(test_user_meta.iloc[i,0], test_user_meta.iloc[i,1:].values) for i in test_user_meta.index]

testset = Dataset()
testset.fit(users=test.userID.unique(), items=test.wine_id.unique(), item_features=test_item_token, user_features=test_user_meta[test_user_meta.columns[1:]].values.flatten())
test_item_features = testset.build_item_features(test_item_feature_source)
test_user_features = testset.build_user_features(test_user_feature_source)

# interactions
(train_interactions, train_weights) = trainset.build_interactions(train[['userID','wine_id','like']].values)
(test_interactions, test_weights) = testset.build_interactions(test[['userID','wine_id','like']].values)
train_interactions, test_interactions = train_interactions.tocsr().tocoo(), test_interactions.tocsr().tocoo()
# train_weights = train_interactions.multiply(train_weights).tocoo()

# model
model = LightFM(loss='warp', random_state=np.random.RandomState(SEEDNO), learning_rate=LEARNING_RATE, no_components=NO_COMPONENTS)
model.fit(interactions=train_interactions, item_features=train_item_features, user_features=train_user_features, epochs=NO_EPOCHS) # sample_weight=train_weights, 

how can I use features which has multiple values(list format)? Appreciate your help!

SimonCW commented 3 years ago

The build_item_features() method takes an iterable of the form (item id, [list of feature names]) or (item id, {feature name: feature weight}). It might also be helpful to work through the example notebook to explore in which format feature values are provided: https://github.com/lyst/lightfm/blob/master/examples/stackexchange/hybrid_crossvalidated.ipynb

I kindly suggest that you provide a minimal working code example that highlights your specific problem. People often won't take the time to look at large dumps of code.

SimonCW commented 3 years ago

I tried to refactor the docs a bit to clarify this. You can have a look and comment here: https://github.com/lyst/lightfm/pull/574

Closing this. Feel free to reopen if your issue isn't solved.