Open ayylemao opened 6 days ago
Thanks for reporting the issue! Can you supply a complete example?
Thank you for getting back to me. Here a complete script:
import json
import torch
from PIL import Image
from pydantic import BaseModel
from transformers import MllamaForConditionalGeneration, AutoProcessor
from typing import List
from lmformatenforcer import JsonSchemaParser
from lmformatenforcer.integrations.transformers import build_transformers_prefix_allowed_tokens_fn
class Brand(BaseModel):
brands: List[str]
model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
model = MllamaForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="cuda:0",
)
processor = AutoProcessor.from_pretrained(model_id)
schema = Brand.model_json_schema()
parser = JsonSchemaParser(schema)
prefix_func = build_transformers_prefix_allowed_tokens_fn(processor.tokenizer, parser)
user = '''Tell me what brands you can see on the provided screenshot, format it in json with the following format: '''
image_path = 'x.png'
image = Image.open(image_path)
messages = [
{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": user+json.dumps(schema)}
]}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(image, input_text, return_tensors="pt").to('cuda:0')
start_generation = inputs['input_ids'].shape[1]
output = model.generate(**inputs, max_new_tokens=512, prefix_allowed_tokens_fn=prefix_func)
result = processor.batch_decode(output[:, start_generation:], skip_special_tokens=True)[0]
print(result)
any luck so far pin pointing the issue?
There's something weird with the tokenizer that the model uses:
Even though the vocab size is 128000, the special added token indices exceed that range.
I added a printout of the # tokens allowed in each timestep, and their min-max values:
In the first step where a special token that exceeds the range is allowed, the error is immediately triggered. From what I understand, the vocab_size is supposed to include the special tokens.
My guess is, that somewhere down the line, for the prefix_function support, the transformers engine allocates a buffer of size tokenizer.vocab_size, and then when the prefix function retuns the list of tokens (which in the last timestep's case, exceeds the max index), and when the allowed logit list is applied, the out of bounds error is thrown.
This looks to be a bug in transformers lib or in the tokenizer of this specific model.
Thank you for your answer. Since I'm trying to raise the issue to transformers or Llama3.2 maintainers, I'm trying to pinpoint the problem. For consistency, I looked at the vocab of Meta-Llama-3.1-8B-Instruct which looks exactly the same as the one for 3.2-11B but here the prefix_function works perfectly.
The code used for minimal example:
import json
import torch
from pydantic import BaseModel
from typing import List
from lmformatenforcer import JsonSchemaParser
from lmformatenforcer.integrations.transformers import build_transformers_prefix_allowed_tokens_fn
import transformers
class Brand(BaseModel):
brands: List[str]
model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="cuda:0",
)
schema = Brand.model_json_schema()
parser = JsonSchemaParser(schema)
prefix_func = build_transformers_prefix_allowed_tokens_fn(pipeline.tokenizer, parser)
user = '''Tell me what brands are provided in this list: ["Microsoft", "Apple", "Intel"]'''
messages = [
{"role": "user", "content": [
{"type": "text", "text": user+json.dumps(schema)}
]}
]
result = pipeline(messages,
prefix_allowed_tokens_fn=prefix_func,
max_new_tokens=256
)
And if we look at the tokenizer it looks consistent with 3.2. Also with vocab_size of 128000 and special tokens added exceeding the range:
This does not look like your bug, but can you give me some more context that I can create a good issue for transformers/llama3.2?
When using the library together with the newly released Llama3.2-11B-Instruct we get a CUDA error.
leads to following error:
Is there a fix for this? I get that the model is quite new but I've never had problems with other newly released models on Hugginface. Also vision models like Idefics3-Llama-8B worked with the lm-format-enforcer without any problems.