为什么训练时候没有设定attention_mask这个参数

yangjianxin1 / GPT2-chitchat

GPT2 for Chinese chitchat/用于中文闲聊的GPT2模型(实现了DialoGPT的MMI思想)

2.99k stars 680 forks source link

Closed Choitsugun closed 1 year ago

Choitsugun commented 2 years ago

model.forward(input_ids=input_ids) 为什么没有设定attention_mask这个参数，不设定的话会导致在数据预处理时填充PAD的部分也会得到注意力分数, 这样行吗？请帮解释一下。

Luciferder commented 1 year ago

这样应该是不好的。我测试过，这样训练出来的模型，如果进行batch的generate，那么pad token的数量会影响生成结果，导致模型的生成不稳定。