Closed wujohns closed 1 year ago
这里应该没什么问题,因为用的是huggingface,GPT2LMHeadModel
内部自己会shift。
https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2LMHeadModel
Note that the labels are shifted inside the model, i.e. you can set labels = input_ids
. All labels set to -100 are ignored (masked), the loss is only computed for labels in [0, ..., config.vocab_size]
这里应该没什么问题,因为用的是huggingface,
GPT2LMHeadModel
内部自己会shift。https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2LMHeadModel
Note that the labels are shifted inside the model, i.e. you can set
labels = input_ids
. All labels set to -100 are ignored (masked), the loss is only computed for labels in [0, ..., config.vocab_size]
感谢,后来也是看这个文档发现没什么问题,当时忘记close issue了
在 train.py 中:
上述两者结合导致训练的loss计算逻辑错误,其梯度处理也会受到原理机制上的影响而变得偏差极大