Closed Tai-Mai closed 5 months ago
@Tai-Mai , shouldn't the code be model_output.logits.squeeze()[-1]
? This is because the forward method predicts the last token based on all the preceding ones. To replicate the .generate()
method's output, you'd likely need to loop the original tokens with the newly predicted token. Do let me know if I'm wrong. Also, for generic model training, Hugging Face provides a Trainer
class .
@bhuvanmdev Thanks for the reply! Yes, you're right. I was confused because I got multiple tokens (nobody, I are you? I
) and I thought that meant that the forward function was already implemented in a way that would take care of the auto-regressive generation for me but that's of course not the case.
Here's the forum post that helped me understand it more as well.
I'll close the issue. Thanks again.
System Info
Who can help?
@ArthurZucker @younesbelkada @gante
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Please run the following script:
I get an output typically somewhat similar to this:
However,
.generate()
also often outputs (grammatical) gibberish or just slips into German for no reason. I've seen the following words a lot of times: Hinweis, Unterscheidung, nobody. If I enable padding, the output from the.generate()
function will have nothing to do with my prompt "Hello, how are you?".Expected behavior
I expected the forward function to give me the same output as the
.generate()
function. The reason I wanted to use the forward function is because I have to train my model in a custom PyTorch training loop and, as far as I understand, that's not possible with.generate()
.I've been trying to troubleshoot this for 2 weeks and I'm getting really desperate. Any kind of help would be very much appreciated.