wuchao-li / RecForest

1 stars 4 forks source link

Bug about NDCG #2

Open Ethan00Si opened 1 year ago

Ethan00Si commented 1 year ago

该论文的开源代码中https://github.com/wuchao-li/RecForest/blob/main/notebooks/gowalla/gowalla.ipynb 计算NDCG的方式出现错误,会导致该指标高于实际数值。错误地方如下图:

image

计算NDCG的normalization constant,也就是Z时,也就是计算IDCG时应该使用ground truth计算,而不是模型预测结果和ground truth的交集。而第一个图中代码中使用了模型预测结果和ground truth的交集。这是一个常见的错误计算NDCG的方式,可以参考下面的链接验证我的描述的正确性,https://github.com/THUDM/ComiRec/issues/6

NDCG的定义如下:

image
Ethan00Si commented 1 year ago

Let me add an English version.

There is an error in the way calculating NDCG in the open source code (https://github.com/wuchao-li/RecForest/blob/main/notebooks/gowalla/gowalla.ipynb) of this paper. This error leads to higher NDCG value than the real NDCG value calculated in the correct way. The code with bug is depicted as follows:

The code to calculate the IDCG, which is underlined in red color, should calculate the the ideal discounted cumulative gain, which is the maximum possible value of DCG@N. In this way, the IDCG should be calculated by the ground truth items, called setgt in your code. But you use the intersection of setgt and re. You can check this link https://github.com/THUDM/ComiRec/issues/6 to validate my words.

The official defination of NDCG can be referred to the following figure: