Open ruian1 opened 6 days ago
Hey @ruian1 !
Yes, unfortunately the model has different shapes for input vs output embeddings (see https://github.com/huggingface/transformers/issues/33819), thus it causes repetition penalty to fail because we try to gather logits for all input ids.
A worlaround is to resize output embeddings and add one extra token for lm_head
. Please take a look how we resized input/output embedding shapes to llava here: https://github.com/huggingface/transformers/blob/94b50c5678047954f7790936439caff93957e638/src/transformers/models/llava_next_video/convert_llava_next_video_weights_to_hf.py#L214-L237
@ArthurZucker wondering if we can resize output embeddings for the model on the hub, as it fails not only for training with labels but for any operation we want to do with logits and input ids? Btw, aren;t those weights supposed to be tied? Sorry I fell out of loop when shipping the model
They are not tied! That's what cause the whole issue 😢 We can have a revision for this, but changing the weights now will unfortunately break. Tho documenting this with a snippet on how to circumvent is super welcome!
sad 🥲 I'll add a small "Note" in the docs then to raise awareness about the issue and how to overcome it
Thanks for the discussion! I have some follow-up question
Cell In[15], line 1
----> 1 model.language_model.model.embed_tokens.weight.data[vocab_size:] = torch.stack(
2 tuple(
3 (dist.sample() for _ in range(model.language_model.model.embed_tokens.weight.data[vocab_size:].shape[0]))
4 ),
5 dim=0,
6 )
8 model.language_model.lm_head.weight.data[vocab_size:] = torch.stack(
9 tuple((dist.sample() for _ in range(model.language_model.lm_head.weight.data[vocab_size:].shape[0]))),
10 dim=0,
11 )
RuntimeError: stack expects a non-empty TensorList
3. I did a quick check and found that
len(processor.tokenizer) is 128257
vocab_size in config.json is 128256
, so I set the num_tokens to 128257
print(model.language_model.model.embed_tokens.weight.data.shape) print(model.language_model.lm_head.weight.data.shape)
vocab_size = 128256 num_tokens = vocab_size + 1
model.resize_token_embeddings(num_tokens)
print(model.language_model.model.embed_tokens.weight.data.shape) print(model.language_model.lm_head.weight.data.shape)
shows that
torch.Size([128264, 4096]) torch.Size([128256, 4096]) torch.Size([128320, 4096]) torch.Size([128256, 4096])
Is it normal that `model.language_model.lm_head.weight.data.shape` does not resize?
4. tnhe I got same error from running this block
```model.language_model.lm_head.weight.data[vocab_size:] = torch.stack(
tuple((dist.sample() for _ in range(model.language_model.lm_head.weight.data[vocab_size:].shape[0]))),
dim=0,
)
error is
RuntimeError Traceback (most recent call last) Cell In[12], line 8 1 model.language_model.model.embed_tokens.weight.data[vocabsize:] = torch.stack( 2 tuple( 3 (dist.sample() for in range(model.language_model.model.embed_tokens.weight.data[vocab_size:].shape[0])) 4 ), 5 dim=0, 6 ) ----> 8 model.language_model.lm_head.weight.data[vocabsize:] = torch.stack( 9 tuple((dist.sample() for in range(model.language_model.lm_head.weight.data[vocab_size:].shape[0]))), 10 dim=0, 11 )
RuntimeError: stack expects a non-empty TensorList
It would be nice if you can help checking what's wrong with my modifcation, thanks for the help!
@ruian1 here is what I got to make it work, I'll add it to the docs soon
import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor
model_id = "mv11/11"
model = MllamaForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id)
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
image = Image.open(requests.get(url, stream=True).raw)
messages = [
{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
]}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(
image,
input_text,
add_special_tokens=False,
return_tensors="pt"
).to(model.device)
#output = model.generate(**inputs, max_new_tokens=30) # this one works
#print(processor.decode(output[0]))
from transformers import GenerationConfig
meta_config = {
"bos_token_id": 128000,
"do_sample": True,
"eos_token_id": [128001, 128008, 128009],
"pad_token_id": 128004,
"temperature": 0.1,
"top_p": 0.9,
"transformers_version": "4.45.0.dev0",
"max_new_tokens": 256,
"repetition_penalty": 1.2,
}
generation_config = GenerationConfig(**meta_config)
pre_expansion_embeddings = model.language_model.lm_head.weight.data
mu = torch.mean(pre_expansion_embeddings, dim=0).float()
n = pre_expansion_embeddings.size()[0]
sigma = ((pre_expansion_embeddings - mu).T @ (pre_expansion_embeddings - mu)) / n
dist = torch.distributions.multivariate_normal.MultivariateNormal(mu, covariance_matrix=1e-5 * sigma)
num_new_tokens = 1 # 1 for special `image` token
lm_head_weights = model.language_model.lm_head.weight
new_token_embedding = torch.stack(tuple(dist.sample() for _ in range(num_new_tokens)), dim=0).to(device=lm_head_weights.device, dtype=lm_head_weights.dtype)
lm_head_weights.data = torch.cat([lm_head_weights.data, new_token_embedding], dim=0)
lm_head_weights.num_embeddings = lm_head_weights.data.shape[0]
output = model.generate(**inputs, generation_config=generation_config)
print(processor.decode(output[0]))
System Info
Who can help?
@zucchini-nlp
I ran into an error when adding the parameter of 'repetition_penalty' into generation_config using the exmaple in https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct. I added a generation_config at the bottom
This would result an error of
meta_config
is taken from here https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct/blob/main/generation_config.jsonInformation
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
Expect generation to run smoothly with repetition_penalty