Closed tinawenzel closed 3 years ago
Hello! I believe the problem is in the order of applied operations in the lines
# get top recommendations
top_items_ids = data['train_cols'][np.argsort(-scores)]
# exclude known positives from recommendations
top_items_ids = np.array(list(set(top_items_ids) - set(known_positives_ids)))
The conversion of top_items_ids
into a set set(top_items_ids)
completely messes an order of sorted by rank items. I propose using a list comprehension in order to exclude known items known_positives_ids
from the top_items_ids
like
top_items_ids = [item_id for item_id in top_items_ids if item_id not in known_positives_ids]
Oh, of course. Thanks for pointing that out.
Follow up question if anyone still follows this thread:
Why does measuring performance using the predict method solution given above yields the same results as the predict_rank method without setting the train_interactions argument?
The way I understand it, using the train_interactions argument in the precision_at_k function would be equivalent to the excluding known positives from recommendations step, but I guess it is not.
Follow up question if anyone still follows this thread:
Why does measuring performance using the predict method solution given above yields the same results as the predict_rank method without setting the train_interactions argument?
The way I understand it, using the train_interactions argument in the precision_at_k function would be equivalent to the excluding known positives from recommendations step, but I guess it is not.
Btw, I made the following change to the code and now it work as I expected:
known_positives_ids = data['train_cols'][data['train'].tocsr()[i].indices]
known_positives_ids = list(known_positives_ids)
In the original code the list comprehension filter was not working due to known_positives_ids being a pd.Series and not a list.
is precision@10 : 0.004322766792029142 is good enough for the model , actually I'm working on another model & the best precision about 0.012 so What's your opinion ?
Hi, I tried to replicate the precision@k score resulting from the
precision_at_k
method using thepredict
method.The
precision_at_k
method is based onpredict_rank
, but since I have many items to rank for each user, thepredict
method is more suitable/faster. Clearly, whether one is usingpredict_rank
orpredict
should not change the precision@k score, but I was unable to replicate the score I get fromprecision_at_k
(based onpredict_rank
) with thepredict
method.In fact the evaluation scores from the
predict
method are always worse than the evaluation scores derived by theprecision_at_k
method included in the package. Why is that?Below is an example using open source data. For simplicity, I'm using only a fraction of the data, a basic model without features, known positives are not removed (train_data argument is not specified in
precision_at_k
).Why is this important: The
predict
method is more suitable in cases where many items need to be ranked, which is my use case as well. Also, I want to calculate ndcg for evaluation and if I can replicate the prec@k score with predict, I know the post-processing of the predictions is correctly set up and I can just change the metric.This gives precision@10 : 0.004322766792029142. Under the hood, the precision@k used the predict_rank method which generates the precision@k like this:
Just to demonstrate that this gives precision@10 : 0.004322766792029142.
Which gives 0.0005763688760806917.
So in summary, the predict_rank gives precision@k score = 0.004322766792029142 and the predict method gives precision@k score=0.0005763688760806917.