Open xiaxin1998 opened 1 week ago
Hey so I tried running the code you gave and I got the same error as well it seems that maybe the pipeline's internal implementation hasn't been updated to handle the new output format when using beam search , also you can try this code out and tell me if it works for you
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from torch.utils.data import Dataset, DataLoader
from tqdm import tqdm
# Load the model and tokenizer
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Ensure the model is on the correct device (CPU or GPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
# Ensure tokenizer handles padding correctly
tokenizer.pad_token = tokenizer.eos_token
# Dummy dataset class
class TextDataset(Dataset):
def __init__(self, texts, tokenizer, max_length=64):
self.texts = texts
self.tokenizer = tokenizer
self.max_length = max_length
def __len__(self):
return len(self.texts)
def __getitem__(self, idx):
# Tokenize the text
inputs = self.tokenizer(
self.texts[idx],
truncation=True,
padding='max_length',
max_length=self.max_length,
return_tensors='pt'
)
return inputs
# Define the 15 dummy texts
dummy_texts = [
"The quick brown fox jumps over the lazy dog.",
"Artificial intelligence is transforming the future of work.",
"The Eiffel Tower is one of the most famous landmarks in the world.",
"Space exploration has always fascinated humanity.",
"In the jungle, the mighty lion roars at dawn.",
"The sun sets beautifully over the ocean horizon.",
"Reading books can improve your vocabulary and critical thinking.",
"Coffee is enjoyed by millions of people worldwide every morning.",
"The internet has revolutionized how we communicate and share information.",
"A healthy diet and regular exercise are key to maintaining good health.",
"The Great Wall of China is visible from space.",
"The invention of the wheel was a pivotal moment in human history.",
"Learning new languages can open doors to different cultures.",
"Mount Everest is the highest peak on Earth.",
"The human brain is capable of extraordinary feats of memory and reasoning."
]
# Initialize the dataset and dataloader
text_dataset = TextDataset(dummy_texts, tokenizer)
dataloader = DataLoader(text_dataset, batch_size=4, shuffle=True)
# Process the batches
for batch in tqdm(dataloader):
# Move batch tensors to the correct device
input_ids = batch['input_ids'].squeeze(1).to(device)
attention_mask = batch['attention_mask'].squeeze(1).to(device)
# Generate text
with torch.no_grad():
outputs = model.generate(
input_ids=input_ids,
attention_mask=attention_mask,
max_new_tokens=30,
num_beams=4,
num_return_sequences=1,
output_scores=True,
return_dict_in_generate=True
)
# Extract the generated sequences and their scores
generated_sequences = outputs.sequences
prediction_scores = outputs.sequences_scores
# Decode the input and generated sequences
input_texts = tokenizer.batch_decode(input_ids, skip_special_tokens=True)
generated_texts = tokenizer.batch_decode(generated_sequences, skip_special_tokens=True)
# Print results
for i in range(len(input_texts)):
print(f"Input: {input_texts[i]}")
print(f"Generated: {generated_texts[i]}")
print(f"Score: {prediction_scores[i].item()}")
print()
Do share if this works for what you are trying to do
Thanks, now I tried to use model.generate instead of using pipeline when I evaluate, it works now in my model setting. But I will expect the pipeline's internal implementation can handle this error.
Can anyone tell how to get scores within the output and why I got this error when using pipeline with output_scores=True, return_dict_in_generate=True? Thanks.
To answer your question, the error seems to come simply from the fact that if you set return_dict_in_generate=True, the underlying model will return a "GenerateDecoderOnlyOutput", and in the current version of the text_generation pipeline this isn't properly handled.
#what the model try to do
def generate(some_args, return_dict_in_generate):
#generate code....
if return_dict_in_generate:
return GenerateDecoderOnlyOutput(
sequences=input_ids,
scores=scores,
logits=raw_logits,
attentions=decoder_attentions,
hidden_states=decoder_hidden_states,
past_key_values=model_kwargs.get("past_key_values"),
)
else:
return input_ids
what the pipeline try to do:
class TextGenerationPipeline(Pipeline):
def forward(some args):
# some code
generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
out_b = generated_sequence.shape[0] # <====== where your code crash because this object doesn't have "shape"
#rest of the code that also need to be changed
basically the pipeline is trying to apply a reshape on this object by assuming it is a torch/tf Tensor, and it fails. I'll try to propose a PR to fix it and a workaround tomorrow.
i looked into it and made a custom pipeline to try out a solution that you could use if needed:
from typing import Union, Dict, Any, Tuple, List
from transformers.pipelines import PIPELINE_REGISTRY
from transformers.generation.utils import GenerateOutput
from transformers.utils import is_tf_available, is_torch_available
from transformers.pipelines import TextGenerationPipeline
import enum
if is_torch_available():
import torch
if is_tf_available():
import tensorflow as tf
class ReturnType(enum.Enum):
TENSORS = 0
NEW_TEXT = 1
FULL_TEXT = 2
class Chat:
"""This class is intended to just be used internally in this pipeline and not exposed to users. We convert chats
to this format because the rest of the pipeline code tends to assume that lists of messages are
actually a batch of samples rather than messages in the same conversation."""
def __init__(self, messages: Dict):
for message in messages:
if not ("role" in message and "content" in message):
raise ValueError("When passing chat dicts as input, each dict must have a 'role' and 'content' key.")
self.messages = messages
class CustomTextGenerationPipeline(TextGenerationPipeline):
def _forward(self, model_inputs, **generate_kwargs):
input_ids = model_inputs["input_ids"]
attention_mask = model_inputs.get("attention_mask", None)
# Allow empty prompts
if input_ids.shape[1] == 0:
input_ids = None
attention_mask = None
in_b = 1
else:
in_b = input_ids.shape[0]
prompt_text = model_inputs.pop("prompt_text")
# If there is a prefix, we may need to adjust the generation length. Do so without permanently modifying
# generate_kwargs, as some of the parameterization may come from the initialization of the pipeline.
prefix_length = generate_kwargs.pop("prefix_length", 0)
if prefix_length > 0:
has_max_new_tokens = "max_new_tokens" in generate_kwargs or (
"generation_config" in generate_kwargs
and generate_kwargs["generation_config"].max_new_tokens is not None
)
if not has_max_new_tokens:
generate_kwargs["max_length"] = generate_kwargs.get("max_length") or self.generation_config.max_length
generate_kwargs["max_length"] += prefix_length
has_min_new_tokens = "min_new_tokens" in generate_kwargs or (
"generation_config" in generate_kwargs
and generate_kwargs["generation_config"].min_new_tokens is not None
)
if not has_min_new_tokens and "min_length" in generate_kwargs:
generate_kwargs["min_length"] += prefix_length
# User-defined `generation_config` passed to the pipeline call take precedence
if "generation_config" not in generate_kwargs:
generate_kwargs["generation_config"] = self.generation_config
generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
output_dict = self._reshape_output(generated_output=generated_sequence, in_b=in_b)
output_dict["input_ids"] = input_ids
output_dict["prompt_text"] = prompt_text
return output_dict
def _reshape_output(self,
generated_output: Union[GenerateOutput, torch.Tensor],
in_b: int) -> Dict[str, Any]:
"""
Reshapes the generated output (sequences, scores, and other relevant outputs) based on batch size (in_b).
Args:
generated_output (Union[GenerateOutput, torch.Tensor]): The output from the model's generate method.
in_b (int): The batch size of the input.
Returns:
Dict[str, Any]: A dictionary containing the reshaped output (sequences, scores, etc.).
"""
output_dict = {}
if isinstance(generated_output, torch.Tensor):
# If it's a tensor directly, reshape it.
output_dict["generated_sequence"] = self._reshape_sequence_tensor(generated_tensor_output=generated_output,
in_b=in_b)
elif isinstance(generated_output, GenerateOutput):
# For the case of GenerateOutput, reshape the sequences and optionally scores
output_dict["generated_sequence"] = self._reshape_sequence_tensor(generated_tensor_output=generated_output.sequences,
in_b=in_b)
if generated_output.scores:
output_dict["scores"] = self._reshape_scores(scores=generated_output.scores, in_b=in_b)
if hasattr(generated_output, "sequences_scores") and generated_output.sequences_scores is not None:
output_dict["generated_sequences_scores"] = self._reshape_sequence_tensor(
generated_tensor_output=generated_output.sequences_scores, in_b=in_b)
else:
raise ValueError(f"{type(generated_output)} format not currently handled by text generation pipeline")
return output_dict
def _reshape_sequence_tensor(self, generated_tensor_output: torch.Tensor, in_b: int) -> torch.Tensor:
"""
Reshapes a tensor to match the batch size (in_b) and handle both PyTorch and TensorFlow frameworks.
Args:
generated_tensor_output (torch.Tensor): The tensor to reshape.
in_b (int): The input batch size.
Returns:
torch.Tensor: The reshaped tensor.
"""
out_b = generated_tensor_output.shape[0]
if self.framework == "pt":
return generated_tensor_output.reshape(in_b, out_b // in_b, *generated_tensor_output.shape[1:])
elif self.framework == "tf":
return tf.reshape(generated_tensor_output, (in_b, out_b // in_b, *generated_tensor_output.shape[1:]))
else:
raise ValueError(f"Framework '{self.framework}' is not supported for reshaping.")
def _reshape_scores(self, scores: Tuple[torch.Tensor], in_b: int) -> Tuple[torch.Tensor, ...]:
"""
Reshapes the score tensors at each beam search step to separate output for each batch element.
Args:
scores (Tuple[torch.Tensor]): Tuple of tensors representing the scores at each beam search step.
in_b (int): The input batch size.
Returns:
Tuple[torch.Tensor, ...]: Tuple of reshaped tensors, with each tensor having shape [batch_size, num_beams, vocab_size].
"""
reshaped_scores: List[torch.Tensor] = []
for score_tensor in scores:
# Dynamically infer num_beams from the first dimension (batch_size * num_beams)
out_b = score_tensor.shape[0]
num_beams = out_b // in_b # Calculate num_beams dynamically
if self.framework == "pt":
reshaped_scores.append(score_tensor.reshape(in_b, num_beams, *score_tensor.shape[1:]))
elif self.framework == "tf":
reshaped_scores.append(tf.reshape(score_tensor, (in_b, num_beams, *score_tensor.shape[1:])))
else:
raise ValueError(f"Framework '{self.framework}' is not supported for reshaping.")
return tuple(reshaped_scores)
def postprocess(
self,
model_outputs,
return_type=ReturnType.FULL_TEXT,
clean_up_tokenization_spaces=True,
continue_final_message=None,
):
generated_sequence = model_outputs["generated_sequence"][0]
input_ids = model_outputs["input_ids"]
prompt_text = model_outputs["prompt_text"]
generated_sequence = generated_sequence.numpy().tolist()
records = []
for sequence_idx, sequence in enumerate(generated_sequence):
if return_type == ReturnType.TENSORS:
record = {"generated_token_ids": sequence}
elif return_type in {ReturnType.NEW_TEXT, ReturnType.FULL_TEXT}:
# Decode text
text = self.tokenizer.decode(
sequence,
skip_special_tokens=True,
clean_up_tokenization_spaces=clean_up_tokenization_spaces,
)
# Remove PADDING prompt of the sequence if XLNet or Transfo-XL model is used
if input_ids is None:
prompt_length = 0
else:
prompt_length = len(
self.tokenizer.decode(
input_ids[0],
skip_special_tokens=True,
clean_up_tokenization_spaces=clean_up_tokenization_spaces,
)
)
all_text = text[prompt_length:]
if return_type == ReturnType.FULL_TEXT:
if isinstance(prompt_text, str):
all_text = prompt_text + all_text
elif isinstance(prompt_text, Chat):
if continue_final_message is None:
# If the user passes a chat ending in an assistant message, we treat it as a prefill by
# default because very few models support multiple separate, consecutive assistant messages
continue_final_message = prompt_text.messages[-1]["role"] == "assistant"
if continue_final_message:
# With assistant prefill, concat onto the end of the last message
all_text = list(prompt_text.messages)[:-1] + [
{
"role": prompt_text.messages[-1]["role"],
"content": prompt_text.messages[-1]["content"] + all_text,
}
]
else:
# When we're not starting from a prefill, the output is a new assistant message
all_text = list(prompt_text.messages) + [{"role": "assistant", "content": all_text}]
record = {"generated_text": all_text}
if "scores" in model_outputs:
scores = []
for scores_step in model_outputs["scores"]:
scores.append(scores_step[0][sequence_idx])
record["scores"] = tuple(scores)
if "generated_sequences_scores" in model_outputs:
record["generated_sequences_scores"] = model_outputs["generated_sequences_scores"][0][sequence_idx]
records.append(record)
return records
PIPELINE_REGISTRY.register_pipeline(
"text-generation-with-score",
pipeline_class=CustomTextGenerationPipeline,
pt_model=AutoModelForCausalLM,
)
to use it simply put it somewhere in a script or a cell and then you can use it like a normal pipeline:
pipe = pipeline(
"text-generation-with-score",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.float16,
device_map="auto",
)
However i don't know if this is the intended use for the TextGenerationPIpeline, if it is the code could be implemented for that purpose, let me know if it is worth it.
System Info
transformers
version: 4.45.1Who can help?
@ArthurZucker @Rocketknight1
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
pipe = pipeline( "text-generation", model="meta-llama/Llama-3.2-1B", torch_dtype=torch.float16, device_map="auto" ) pipe.tokenizer.pad_token_id = 0 for batch in tqdm(dataloader):
Decode input IDs into text for the pipeline
Expected behavior
Expected behavior will be a dict that contains generated sequences and scores. But I got the error: predictions_with_scores = pipe(input_texts, max_new_tokens=30, num_beams=k, num_return_sequences=1, output_scores=True, return_dict_in_generate=True) File "/opt/anaconda3/envs/openllama/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 272, in call return super().call(text_inputs, kwargs) File "/opt/anaconda3/envs/openllama/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1249, in call outputs = list(final_iterator) File "/opt/anaconda3/envs/openllama/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in next item = next(self.iterator) File "/opt/anaconda3/envs/openllama/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 125, in next processed = self.infer(item, self.params) File "/opt/anaconda3/envs/openllama/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1175, in forward model_outputs = self._forward(model_inputs, **forward_params) File "/opt/anaconda3/envs/openllama/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 371, in _forward out_b = generated_sequence.shape[0] AttributeError: 'GenerateBeamDecoderOnlyOutput' object has no attribute 'shape'
Can anyone tell how to get scores within the output and why I got this error when using pipeline with output_scores=True, return_dict_in_generate=True? Thanks.