summmeer / session-based-news-recommendation

source code of paper "Positive, Negative and Neutral: Modeling Implicit Feedback in Session-based News Recommendation", which is accepted at SIGIR 2022.
31 stars 8 forks source link

关于指标ILD和unexp的计算问题 #4

Closed nibyig closed 1 year ago

nibyig commented 2 years ago

教授您好!关于指标ILD和unexp,我们有一些问题。 我们算出来的ILD20和unexp20都是0,并且速度非常慢,循环体量太大,耗时长。 我们有两个问题: 第一个是ILD20和unexp20的计算方法好像存在问题。 第二个是教授团队在具体实验中出指标计算速度似乎非常快,请问有什么诀窍嘛。 根据论文中的定义,ILD和unexp中的d(a,b)是指a,b不为一个topic或者category时为1,否则为0 根据源码:

def getILD(self, recList):
    score = 0
    n = len(recList)
    for i in range(0, n):
        for j in range(0, n):
            if self.reverse_item[recList[i]] in self.category_id and self.reverse_item[recList[j]] in self.category_id:
                if j!=i and self.category_id[self.reverse_item[recList[i]]]!=self.category_id[self.reverse_item[recList[j]]]:
                    score += 1
    return score/(n*(n-1))

def getUnexp(self, inSeq, recList):
    score = 0
    n = len(recList)
    if n==0:
        return 0
    for i in range(0, n):
        for ini in inSeq:
            if self.reverse_item[recList[i]] in self.category_id and self.reverse_item[ini-1] in self.category_id:
                if self.category_id[self.reverse_item[recList[i]]]!=self.category_id[self.reverse_item[ini-1]]:
                    score +=1
    return score/(n*len(inSeq))

我们需要通过中间的一个字典self.reverse_item来判断i和j是不是一个category,但是self.reverse_item并不是算的有关item的category,只是简单的标号而已,以至于我们算出来的ILD20和unexp20都是0。 我们尝试进行修改,self.reverse_item这一层好像需要去掉,即改成:

def getILD(self, recList):
    score = 0
    n = len(recList)
    for i in range(0, n):
        for j in range(0, n):
            if j!=i and self.category_id[recList[i]]!=self.category_id[recList[j]]:
                score += 1
    return score/(n*(n-1))

def getUnexp(self, inSeq, recList):
    score = 0
    n = len(recList)
    if n==0:
        return 0
    for i in range(0, n):
        for ini in inSeq:
            if self.category_id[recList[i]]!=self.category_id[ini-1]:
                score +=1
    return score/(n*len(inSeq))

想向教授验证一下这样改动是否正确,如不正确,应当如何修改。

summmeer commented 2 years ago

You're right, for Globo dataset, the category_id is very large so it will cost a lot of time. When I run, I will remove this line: if self.reverse_item[recList[i]] in self.category_id. But this line is still needed for other datasets. As for reverse_item, it maps the origin id with the index in embedding matrix, so it's better to keep it.

nibyig commented 2 years ago

好的,明白了,谢谢您的解答!顺带问一下这个模型如何获得HR指标呀,源代码里似乎没有

summmeer commented 2 years ago

In our session-based scenario, mathematically, Recall@k==HR@k.