Open shivamag125 opened 4 months ago
While the computing the loss L136, shouldn't the logits and targets be rolled to account for next token prediction?
Similar to https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1092
Edit- I see that you took care of it while preparing the targets.
While the computing the loss L136, shouldn't the logits and targets be rolled to account for next token prediction?
Similar to https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L1092