rmokady / CLIP_prefix_caption

Simple image captioning model
MIT License
1.29k stars 214 forks source link

FInal token index when training a model #42

Closed robertodessi closed 2 years ago

robertodessi commented 2 years ago

Hi, thanks a lot for making the code available, it's a great resource to use!

I was wondering why the index when computing the loss from the output of gpt is shifted by 1 on the left: https://github.com/rmokady/CLIP_prefix_caption/blob/main/train.py#L315

shouldn't it be logits = outputs.logits[:, dataset.prefix_length :]

Thanks!

Twilighter9527 commented 2 years ago

when I follow you change logits to be "logits = outputs.logits[:, dataset.prefix_length :]",the loss become very slow quickly and the inference caption become "\n\n\n\n.......", do you have this situation?or that should be "logits = outputs.logits[:, dataset.prefix_length - 1: -1]"thank you. image

baaaad commented 2 years ago

the gpt2 acts the ''next token predixtion', so the token in time step t is predicted by the token in time step t-1. the final token does not need to predict for the next token. so it is right to use logits = outputs.logits[:, dataset.prefix_length -1 : -1]

Twilighter9527 commented 2 years ago

you can see the source image

robertodessi commented 2 years ago

True, makes sense thanks!