Closed sijeh closed 1 year ago
Thanks for your interest! This is because the OPT models and tokenizer do not add the [EOS]
tokens by default. Hence, the last token is [RET]
during retrieval training, which his why it's caption_len - 1
. Hence the comment there is a bit misleading, which I apologize for!
Thanks for your interest! This is because the OPT models and tokenizer do not add the
[EOS]
tokens by default. Hence, the last token is[RET]
during retrieval training, which his why it'scaption_len - 1
. Hence the comment there is a bit misleading, which I apologize for!
I see, thanks for your kind reply.
https://github.com/kohjingyu/fromage/blob/2652cc647339aec32d6ef8be7cbf51e7d9fc341f/fromage/models.py#L184
Hello kohjingyu, thanks for your great work!
I'm a bit confused about the variable
last_embedding_idx
inmodels.py
. The input caption seems like[..., [RET], [EOS]]
, therefore thecaption_len - 1
refers to the index of[EOS]
, thus thelast_hidden_state[i, last_embedding_idx[i], :]
indicates the output hidden state of token[EOS]
which is used to retrieve images. Is there anything wrong? Please point out if I misunderstood the intent.Best regards.