Open harshil-shah opened 3 weeks ago
cc @qubvel maybe? :)
Hi all,
I’ve reviewed the issue, and it seems that the mismatch between the vocab size of the MllamaProcessor and the lm_head in the model is causing the IndexError. When the resize_token_embeddings() method is called, it does not appear to resize the lm_head weight matrix, which is leading to the error during training.
One way to address this can be the resize_token_embeddings() function also resizes the lm_head layer so that it aligns with the new vocabulary size. If anyone has any suggestions or additional details that could help, feel free to share them.
Looking forward to your feedback.
Hi @phionex2, thanks for opening the issue! Please see this discussion regarding mismatch and how to enable the model finetuning
TLDR; the main idea is that the image token is not intended to be trained and should be masked
image_token_id = processor.tokenizer.convert_tokens_to_ids(processor.image_token)
labels[labels == image_token_id] = -100
Hi @qubvel , thanks for the heads-up!
I went through the discussion, and it seems the main issue stems from the image token not being intended for training. Based on the suggestion in the thread, masking the image token in the labels seems like the right approach.
image_token_id = processor.tokenizer.convert_tokens_to_ids(processor.image_token)
labels[labels == image_token_id] = -100
This effectively prevents the image token from contributing to the loss calculation during training.
Indeed! I think we can close this? It is "expected" from the way the model was designed that there are missmatches between lm head and embedding unfortunately 😢
System Info
transformers
version: 4.45.1Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Hi,
It seems there is a mismatch between the vocab size in the
MllamaProcessor
and the size of thelm_head
weight matrix. Trying to callresize_token_embeddings
doesn't fix this. This means that it is not possible to do training. Minimal example:This outputs:
And then errors with:
Expected behavior
The vocab size of the processor and model should match.