Open newlei opened 4 years ago
Agreed. Simple example why this is not the right thing to do: with the current version, one hit in the 1. rank ([1,0,0]) is a perfect score, while one hit in 1. rank and one hit in 3. rank ([1,0,1]) is less than perfect . This is obviously wrong, since the latter should be strictly better.
Correctly it would read something like dcg_max = dcg_at_k([1 for i in range(k)], k, method)
, i.e. we compare to the best achievable score. Unless of course the user's test set can be smaller than k
in which case it's dcg_max = dcg_at_k([int(i<len(user_test)) for i in range(k)], k, method)
Hi, through Wikipedia and related papers, I find that NDCG=DCG/IDCG. The maximum possible DCG is called Ideal DCG (IDCG), in other words, IDCG is the DCG value of the best ranking function on a dataset. so for a specific user in test data, the best ranking is unique, then the IDCG is unchanging. However, in your code, the IDCG is changing. The calculation of the dcg_max(IDCG) depends on parameters
r
, butr
is not the best ranking and will change according to the predicted of the model.and