pykt-team / pykt-toolkit

pyKT: A Python Library to Benchmark Deep Learning based Knowledge Tracing Models
https://pykt.org
MIT License
212 stars 58 forks source link

target response issue in AKT model #44

Closed skewondr closed 2 years ago

skewondr commented 2 years ago

Hello, I want to ask your opinion on the AKT model with regard to the reason why that model performs best in your delicate framework. (https://arxiv.org/abs/2206.11460)

image

the image above is the figure of AKT model represented in the paper

qa_embed_diff_data = self.qa_embeddiff( target) # f(ct,rt) or #h_rt (qt, rt)差异向量 if self.separate_qa: qa_embed_data = qa_embed_data + pid_embed_data \ qa_embed_diff_data # uq f(ct,rt) + e(ct,rt) else: qa_embed_data = qa_embed_data + pid_embed_data \ (qa_embed_diff_data+q_embed_diff_data) # + uq (h_rt+d_ct) # (q-response emb diff + question emb diff)

and the code above is what you implemented at pykt/models/akt.py.

I think you followed the right way as the paper's author described. The point is that I think AKT model has the best performance because it has a chance to know the target answers with "f(c_t, r_t) variation vector" (at the paper), which is "qa_embed_diff_data" (at your code).

As a result, in my opinion, AKT has the best performance because of its already-known target issue.

To resolve the issue, I suggest modifying Architecture forward function as the following code:

        else:  # dont peek current response
            pad_zero = torch.zeros(batch_size, 1, x.size(-1)).to(self.device)
            q = x
            k = torch.cat([pad_zero, x[:, :-1, :]], dim=1)
            v = torch.cat([pad_zero, y[:, :-1, :]], dim=1)
            x = block(mask=0, query=q, key=k, values=v, apply_pos=True) # True: +FFN+残差+laynorm 非第一层与0~t-1的的q的attention, 对应图中Knowledge Retriever
            # mask=0,不能看到当前的r
            flag_first = True

thank you for your attention :)

sonyawong commented 2 years ago

Hi~thank you for your interest in our work. Actually, AKT has masked the current exercise via the following code (line 211-217 in pykt/models/akt.py):

        nopeek_mask = np.triu(
            np.ones((1, 1, seqlen, seqlen)), k=mask).astype('uint8')
        src_mask = (torch.from_numpy(nopeek_mask) == 0).to(device)
        if mask == 0:  # If 0, zero-padding is needed.
            # Calls block.masked_attn_head.forward() method
            query2 = self.masked_attn_head(
                query, key, values, mask=src_mask, zero_pad=True, pdiff=pdiff)

In our view, the rasch model-based embedding method and the monotonic attention contribute to the AKT getting a significant performance (as the authors discussed in Section 4.2, Table 4&5). If you have further inquiries about our repo, please feel free to contact us.

skewondr commented 2 years ago

Thank you for the response. It really helped me a lot :)

Tong198-Hu commented 1 year ago

it's a good question ,i have a same doubt like you