Open Ethan00Si opened 1 year ago
Let me add an English version.
There is an error in the way calculating NDCG in the open source code (https://github.com/wuchao-li/RecForest/blob/main/notebooks/gowalla/gowalla.ipynb) of this paper. This error leads to higher NDCG value than the real NDCG value calculated in the correct way. The code with bug is depicted as follows:
The code to calculate the IDCG, which is underlined in red color, should calculate the the ideal discounted cumulative gain, which is the maximum possible value of DCG@N. In this way, the IDCG should be calculated by the ground truth items, called setgt
in your code. But you use the intersection of setgt
and re
.
You can check this link https://github.com/THUDM/ComiRec/issues/6 to validate my words.
The official defination of NDCG can be referred to the following figure:
该论文的开源代码中https://github.com/wuchao-li/RecForest/blob/main/notebooks/gowalla/gowalla.ipynb 计算NDCG的方式出现错误,会导致该指标高于实际数值。错误地方如下图:
计算NDCG的normalization constant,也就是Z时,也就是计算IDCG时应该使用ground truth计算,而不是模型预测结果和ground truth的交集。而第一个图中代码中使用了模型预测结果和ground truth的交集。这是一个常见的错误计算NDCG的方式,可以参考下面的链接验证我的描述的正确性,https://github.com/THUDM/ComiRec/issues/6
NDCG的定义如下: