pykt-team / pykt-toolkit

pyKT: A Python Library to Benchmark Deep Learning based Knowledge Tracing Models
https://pykt.org
MIT License
194 stars 53 forks source link

AKT Information Leakage #182

Closed smallz2001 closed 2 months ago

smallz2001 commented 3 months ago

Hello, thank you for your contributions to the research on KT. In my investigation into why the AKT model achieves such high performance, it seems that there might be an issue of information leakage. When I utilized only the encoder part of the transformer within AKT, I observed a phenomenon of performance inflation. The specific practice is as follows:

The code

flag_first = True
        for block in self.blocks_2:
            if flag_first:  # peek current question
                x = block(mask=1, query=x, key=x,
                          values=x, apply_pos=False, pdiff=pid_embed_data) # False: 没有FFN, 第一层只有self attention, 对应于xt^
                flag_first = False
            else:  # dont peek current response
                x = block(mask=0, query=x, key=x, values=y, apply_pos=True, pdiff=pid_embed_data) # True: +FFN+残差+laynorm 非第一层与0~t-1的的q的attention, 对应图中Knowledge Retriever
                # mask=0,不能看到当前的response, 在Knowledge Retrever的value全为0,因此,实现了第一题只有question信息,无qa信息的目的
                # print(x[0,0,:])
                flag_first = True
        return x

was replaced with:

return y

This indicates that the masking is not successful, leading to the performance inflation of AKT, which suggests that the AKT model has knowledge of future information.

smallz2001 commented 3 months ago

Hi @sonyawong pls, help me !

LeavesLi1015 commented 2 months ago

I'm confusing with the same problem. I found the performances of DKT, DKVMN,AKT, SAKT, SAINT on other datasets are generally 5-10% higher than those in the original paper. I suspect it may be information leakage.

LeavesLi1015 commented 2 months ago

oh I found the answer here: https://github.com/pykt-team/pykt-toolkit/issues/144 It said that when using KC-level model, the metrics started with "window{metrics}late" shall prevail, because of information leakage