Open nishal999 opened 6 years ago
Your model is using user and item features. The evaluation functions need these arguments. Try:
train_precision = precision_at_k(model, train, k=10, user_features=uf, item_features=itemf).mean()
Ahh yes. Now I get it. Its solved now. Thank you so much!!!
I am trying to validate my model with respect to train and test data. For example, I want to include only interactions for 11 months as my training set and want to test it on userids for the 12th month. The problem here is, I cannot have different dimensionalities for train and test as mentioned by @maciejkula here. I have lost direction as to how I should be proceeding further.
There is no problem to do so. You build one dataset to which you fit all your users, user-features, items and item-features by calling fit
and fit_partial
. Similarly you call build_user_features
and build_item_features
to build the feature matrices for all users and items. Next you call build_interactions
twice, once with the interactions of the first user group (1-11 months) to get the test interaction matrix, and the second time with the interactions of the second user group (12th month) to get the test matrix.
Again, @DoronGi's answer is exactly correct.
Thank you @maciejkula and @DoronGi
I have this problem, but without using features, e.g.
model.item_biases = 0
model.fit(X_train,
num_threads = 6)
train_precision = precision_at_k(model, X_train, k=10).mean()
test_precision = precision_at_k(model, X_test, k=10).mean()
I get the same error at the test_precision call about having an incorrect number of user features. The dimensions of my train and test matrices are as follows:
4290x40744 sparse matrix of type 'class 'numpy.float32'
with 73414 stored elements in Compressed Sparse Row format,
1430x40744 sparse matrix of type 'class 'numpy.float32''
with 26586 stored elements in Compressed Sparse Row format```.
@ctivanovich I think you should keep the matrix sizes the same, just fill 0 rows to the users you want to exclude, this way it's much less confusing. this library (or maybe it's industry standard) assumes row # = user id.
The way I do that is I convert the interactions to lil matrix and just set = 0 the interactions I want to exclude then I convert to coo again and train/test.
I have a question why we need k to compute precision, but auc_score does not require? Thank you
@tracthuc This isn't really the place for a question like that, you should be asking on e.g. StackExchange. But in a nut shell, AUC simply doesn't require k, it's not a measure related to the ordinality of the recommendations. I highly recommend this video series: https://www.youtube.com/watch?v=4jRBRDbJemM.
I have a sparse matrix(train/test data) of shape (1407580, 235061), which means there are around 330Bn combinations of user_id and item_id. This is causing precision_at_k and others to take way too much time to calculate. I am thinking about calculating the precision at k only for a small set of data by writing code myself. Will this be good enough for model validation?
There is no problem to do so. You build one dataset to which you fit all your users, user-features, items and item-features by calling
fit
andfit_partial
. Similarly you callbuild_user_features
andbuild_item_features
to build the feature matrices for all users and items. Next you callbuild_interactions
twice, once with the interactions of the first user group (1-11 months) to get the test interaction matrix, and the second time with the interactions of the second user group (12th month) to get the test matrix.
Wouldn't be better to split the metadata from the beginning also ? as we normally do for other ML problems ?
I am building a recommendation model for user-article dataset where each interaction is represented by 1.
model = LightFM(loss='warp', item_alpha=ITEM_ALPHA, user_alpha=USER_ALPHA, no_components=NUM_COMPONENTS, learning_rate=LEARNING_RATE, learning_schedule=LEARNING_SCHEDULE)
model = model.fit(train, item_features=itemf, user_features=uf, epochs=NUM_EPOCHS, num_threads=NUM_THREADS)
print("train shape: ",train.shape) print("test shape: ",test.shape)
train shape: (25900, 790) test shape: (25900, 790)
My predict model looks like this:
predictions = model.predict( user_id, pid_array, user_features=uf, item_features=itemf, num_threads=4)
where pid_array are indexes of number of items
train_precision = precision_at_k(model, train, k=10).mean()
I am trying to predict the precision and subsequently want auc score also. But I get this error.
Traceback (most recent call last): File "new_light_fm.py", line 366, in
train_precision = precision_at_k(model, train, k=10).mean()
File "/home/nt/anaconda3/lib/python3.6/site-packages/lightfm/evaluation.py", line 69, in precision_at_k
check_intersections=check_intersections,
File "/home/nt/anaconda3/lib/python3.6/site-packages/lightfm/lightfm.py", line 807, in predict_rank
raise ValueError('Incorrect number of features in item_features')
ValueError: Incorrect number of features in item_features