xiaxin1998 commented 1 week ago

System Info

transformers version: 4.45.1
Platform: Linux-6.5.0-15-generic-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.25.0
Safetensors version: 0.4.5
Accelerate version: 0.34.2
Accelerate config: not found

Who can help?

@ArthurZucker @Rocketknight1

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

pipe = pipeline( "text-generation", model="meta-llama/Llama-3.2-1B", torch_dtype=torch.float16, device_map="auto" ) pipe.tokenizer.pad_token_id = 0 for batch in tqdm(dataloader):

Decode input IDs into text for the pipeline

    input_texts = pipe.tokenizer.batch_decode(batch["input_ids"], skip_special_tokens=True)
    predictions_with_scores = pipe(input_texts, max_new_tokens=30, num_beams=generate_num, num_return_sequences=1, output_scores=True, return_dict_in_generate=True)

    generated_sents = [pred for pred in predictions_with_scores['sequences']]

    prediction_scores = predictions_with_scores['sequences_scores']  # These are the scores returned from pipeline

    # Decode the gold (ground truth) sentences
    gold_sents = pipe.tokenizer.batch_decode(batch['label'], skip_special_tokens=True)

Expected behavior

Expected behavior will be a dict that contains generated sequences and scores. But I got the error: predictions_with_scores = pipe(input_texts, max_new_tokens=30, num_beams=k, num_return_sequences=1, output_scores=True, return_dict_in_generate=True) File "/opt/anaconda3/envs/openllama/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 272, in call return super().call(text_inputs, kwargs) File "/opt/anaconda3/envs/openllama/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1249, in call outputs = list(final_iterator) File "/opt/anaconda3/envs/openllama/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in next item = next(self.iterator) File "/opt/anaconda3/envs/openllama/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 125, in next processed = self.infer(item, self.params) File "/opt/anaconda3/envs/openllama/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1175, in forward model_outputs = self._forward(model_inputs, **forward_params) File "/opt/anaconda3/envs/openllama/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 371, in _forward out_b = generated_sequence.shape[0] AttributeError: 'GenerateBeamDecoderOnlyOutput' object has no attribute 'shape'

Can anyone tell how to get scores within the output and why I got this error when using pipeline with output_scores=True, return_dict_in_generate=True? Thanks.

dame-cell commented 1 week ago

Hey so I tried running the code you gave and I got the same error as well it seems that maybe the pipeline's internal implementation hasn't been updated to handle the new output format when using beam search , also you can try this code out and tell me if it works for you

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from torch.utils.data import Dataset, DataLoader
from tqdm import tqdm

# Load the model and tokenizer
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Ensure the model is on the correct device (CPU or GPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Ensure tokenizer handles padding correctly
tokenizer.pad_token = tokenizer.eos_token

# Dummy dataset class
class TextDataset(Dataset):
    def __init__(self, texts, tokenizer, max_length=64):
        self.texts = texts
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.texts)

    def __getitem__(self, idx):
        # Tokenize the text
        inputs = self.tokenizer(
            self.texts[idx], 
            truncation=True, 
            padding='max_length', 
            max_length=self.max_length, 
            return_tensors='pt'
        )
        return inputs

# Define the 15 dummy texts
dummy_texts = [
    "The quick brown fox jumps over the lazy dog.",
    "Artificial intelligence is transforming the future of work.",
    "The Eiffel Tower is one of the most famous landmarks in the world.",
    "Space exploration has always fascinated humanity.",
    "In the jungle, the mighty lion roars at dawn.",
    "The sun sets beautifully over the ocean horizon.",
    "Reading books can improve your vocabulary and critical thinking.",
    "Coffee is enjoyed by millions of people worldwide every morning.",
    "The internet has revolutionized how we communicate and share information.",
    "A healthy diet and regular exercise are key to maintaining good health.",
    "The Great Wall of China is visible from space.",
    "The invention of the wheel was a pivotal moment in human history.",
    "Learning new languages can open doors to different cultures.",
    "Mount Everest is the highest peak on Earth.",
    "The human brain is capable of extraordinary feats of memory and reasoning."
]

# Initialize the dataset and dataloader
text_dataset = TextDataset(dummy_texts, tokenizer)
dataloader = DataLoader(text_dataset, batch_size=4, shuffle=True)

# Process the batches
for batch in tqdm(dataloader):
    # Move batch tensors to the correct device
    input_ids = batch['input_ids'].squeeze(1).to(device)
    attention_mask = batch['attention_mask'].squeeze(1).to(device)

    # Generate text
    with torch.no_grad():
        outputs = model.generate(
            input_ids=input_ids,
            attention_mask=attention_mask,
            max_new_tokens=30,
            num_beams=4,
            num_return_sequences=1,
            output_scores=True,
            return_dict_in_generate=True
        )

    # Extract the generated sequences and their scores
    generated_sequences = outputs.sequences
    prediction_scores = outputs.sequences_scores

    # Decode the input and generated sequences
    input_texts = tokenizer.batch_decode(input_ids, skip_special_tokens=True)
    generated_texts = tokenizer.batch_decode(generated_sequences, skip_special_tokens=True)

    # Print results
    for i in range(len(input_texts)):
        print(f"Input: {input_texts[i]}")
        print(f"Generated: {generated_texts[i]}")
        print(f"Score: {prediction_scores[i].item()}")
        print()

Do share if this works for what you are trying to do

xiaxin1998 commented 1 week ago

Thanks, now I tried to use model.generate instead of using pipeline when I evaluate, it works now in my model setting. But I will expect the pipeline's internal implementation can handle this error.

Warra07 commented 6 days ago

Can anyone tell how to get scores within the output and why I got this error when using pipeline with output_scores=True, return_dict_in_generate=True? Thanks.

To answer your question, the error seems to come simply from the fact that if you set return_dict_in_generate=True, the underlying model will return a "GenerateDecoderOnlyOutput", and in the current version of the text_generation pipeline this isn't properly handled.

#what the model try to do
def generate(some_args, return_dict_in_generate):
#generate code....
        if return_dict_in_generate:
            return GenerateDecoderOnlyOutput(
                sequences=input_ids,
                scores=scores,
                logits=raw_logits,
                attentions=decoder_attentions,
                hidden_states=decoder_hidden_states,
                past_key_values=model_kwargs.get("past_key_values"),
            )
        else:
            return input_ids

what the pipeline try to do:

class TextGenerationPipeline(Pipeline):
    def forward(some args):
        # some code
        generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
        out_b = generated_sequence.shape[0] # <====== where your code crash because this object doesn't have "shape"
        #rest of the code that also need to be changed

basically the pipeline is trying to apply a reshape on this object by assuming it is a torch/tf Tensor, and it fails. I'll try to propose a PR to fix it and a workaround tomorrow.

Warra07 commented 5 days ago

i looked into it and made a custom pipeline to try out a solution that you could use if needed:


from typing import Union, Dict, Any, Tuple, List
from transformers.pipelines import PIPELINE_REGISTRY

from transformers.generation.utils import GenerateOutput
from transformers.utils import is_tf_available, is_torch_available
from transformers.pipelines import TextGenerationPipeline
import enum

if is_torch_available():
    import torch

if is_tf_available():
    import tensorflow as tf

class ReturnType(enum.Enum):
    TENSORS = 0
    NEW_TEXT = 1
    FULL_TEXT = 2

class Chat:
    """This class is intended to just be used internally in this pipeline and not exposed to users. We convert chats
    to this format because the rest of the pipeline code tends to assume that lists of messages are
    actually a batch of samples rather than messages in the same conversation."""

    def __init__(self, messages: Dict):
        for message in messages:
            if not ("role" in message and "content" in message):
                raise ValueError("When passing chat dicts as input, each dict must have a 'role' and 'content' key.")
        self.messages = messages

class CustomTextGenerationPipeline(TextGenerationPipeline):
    def _forward(self, model_inputs, **generate_kwargs):
        input_ids = model_inputs["input_ids"]
        attention_mask = model_inputs.get("attention_mask", None)
        # Allow empty prompts
        if input_ids.shape[1] == 0:
            input_ids = None
            attention_mask = None
            in_b = 1
        else:
            in_b = input_ids.shape[0]
        prompt_text = model_inputs.pop("prompt_text")

        # If there is a prefix, we may need to adjust the generation length. Do so without permanently modifying
        # generate_kwargs, as some of the parameterization may come from the initialization of the pipeline.
        prefix_length = generate_kwargs.pop("prefix_length", 0)
        if prefix_length > 0:
            has_max_new_tokens = "max_new_tokens" in generate_kwargs or (
                "generation_config" in generate_kwargs
                and generate_kwargs["generation_config"].max_new_tokens is not None
            )
            if not has_max_new_tokens:
                generate_kwargs["max_length"] = generate_kwargs.get("max_length") or self.generation_config.max_length
                generate_kwargs["max_length"] += prefix_length
            has_min_new_tokens = "min_new_tokens" in generate_kwargs or (
                "generation_config" in generate_kwargs
                and generate_kwargs["generation_config"].min_new_tokens is not None
            )
            if not has_min_new_tokens and "min_length" in generate_kwargs:
                generate_kwargs["min_length"] += prefix_length

        # User-defined `generation_config` passed to the pipeline call take precedence
        if "generation_config" not in generate_kwargs:
            generate_kwargs["generation_config"] = self.generation_config

        generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
        output_dict = self._reshape_output(generated_output=generated_sequence, in_b=in_b)

        output_dict["input_ids"] = input_ids
        output_dict["prompt_text"] = prompt_text

        return output_dict

    def _reshape_output(self,
                        generated_output: Union[GenerateOutput, torch.Tensor],
                        in_b: int) -> Dict[str, Any]:
        """
        Reshapes the generated output (sequences, scores, and other relevant outputs) based on batch size (in_b).

        Args:
            generated_output (Union[GenerateOutput, torch.Tensor]): The output from the model's generate method.
            in_b (int): The batch size of the input.

        Returns:
            Dict[str, Any]: A dictionary containing the reshaped output (sequences, scores, etc.).
        """
        output_dict = {}

        if isinstance(generated_output, torch.Tensor):
            # If it's a tensor directly, reshape it.
            output_dict["generated_sequence"] = self._reshape_sequence_tensor(generated_tensor_output=generated_output,
                                                                              in_b=in_b)

        elif isinstance(generated_output, GenerateOutput):
            # For the case of GenerateOutput, reshape the sequences and optionally scores
            output_dict["generated_sequence"] = self._reshape_sequence_tensor(generated_tensor_output=generated_output.sequences,
                                                                              in_b=in_b)

            if generated_output.scores:
                output_dict["scores"] = self._reshape_scores(scores=generated_output.scores, in_b=in_b)

            if hasattr(generated_output, "sequences_scores") and generated_output.sequences_scores is not None:
                output_dict["generated_sequences_scores"] = self._reshape_sequence_tensor(
                    generated_tensor_output=generated_output.sequences_scores, in_b=in_b)

        else:
            raise ValueError(f"{type(generated_output)} format not currently handled by text generation pipeline")

        return output_dict

    def _reshape_sequence_tensor(self, generated_tensor_output: torch.Tensor, in_b: int) -> torch.Tensor:
        """
        Reshapes a tensor to match the batch size (in_b) and handle both PyTorch and TensorFlow frameworks.

        Args:
            generated_tensor_output (torch.Tensor): The tensor to reshape.
            in_b (int): The input batch size.

        Returns:
            torch.Tensor: The reshaped tensor.
        """
        out_b = generated_tensor_output.shape[0]
        if self.framework == "pt":
            return generated_tensor_output.reshape(in_b, out_b // in_b, *generated_tensor_output.shape[1:])
        elif self.framework == "tf":
            return tf.reshape(generated_tensor_output, (in_b, out_b // in_b, *generated_tensor_output.shape[1:]))
        else:
            raise ValueError(f"Framework '{self.framework}' is not supported for reshaping.")

    def _reshape_scores(self, scores: Tuple[torch.Tensor], in_b: int) -> Tuple[torch.Tensor, ...]:
        """
        Reshapes the score tensors at each beam search step to separate output for each batch element.

        Args:
            scores (Tuple[torch.Tensor]): Tuple of tensors representing the scores at each beam search step.
            in_b (int): The input batch size.

        Returns:
            Tuple[torch.Tensor, ...]: Tuple of reshaped tensors, with each tensor having shape [batch_size, num_beams, vocab_size].
        """
        reshaped_scores: List[torch.Tensor] = []
        for score_tensor in scores:
            # Dynamically infer num_beams from the first dimension (batch_size * num_beams)
            out_b = score_tensor.shape[0]
            num_beams = out_b // in_b  # Calculate num_beams dynamically

            if self.framework == "pt":
                reshaped_scores.append(score_tensor.reshape(in_b, num_beams, *score_tensor.shape[1:]))
            elif self.framework == "tf":
                reshaped_scores.append(tf.reshape(score_tensor, (in_b, num_beams, *score_tensor.shape[1:])))
            else:
                raise ValueError(f"Framework '{self.framework}' is not supported for reshaping.")

        return tuple(reshaped_scores)

    def postprocess(
        self,
        model_outputs,
        return_type=ReturnType.FULL_TEXT,
        clean_up_tokenization_spaces=True,
        continue_final_message=None,
    ):
        generated_sequence = model_outputs["generated_sequence"][0]
        input_ids = model_outputs["input_ids"]
        prompt_text = model_outputs["prompt_text"]
        generated_sequence = generated_sequence.numpy().tolist()
        records = []
        for sequence_idx, sequence in enumerate(generated_sequence):
            if return_type == ReturnType.TENSORS:
                record = {"generated_token_ids": sequence}
            elif return_type in {ReturnType.NEW_TEXT, ReturnType.FULL_TEXT}:
                # Decode text
                text = self.tokenizer.decode(
                    sequence,
                    skip_special_tokens=True,
                    clean_up_tokenization_spaces=clean_up_tokenization_spaces,
                )

                # Remove PADDING prompt of the sequence if XLNet or Transfo-XL model is used
                if input_ids is None:
                    prompt_length = 0
                else:
                    prompt_length = len(
                        self.tokenizer.decode(
                            input_ids[0],
                            skip_special_tokens=True,
                            clean_up_tokenization_spaces=clean_up_tokenization_spaces,
                        )
                    )

                all_text = text[prompt_length:]
                if return_type == ReturnType.FULL_TEXT:
                    if isinstance(prompt_text, str):
                        all_text = prompt_text + all_text
                    elif isinstance(prompt_text, Chat):
                        if continue_final_message is None:
                            # If the user passes a chat ending in an assistant message, we treat it as a prefill by
                            # default because very few models support multiple separate, consecutive assistant messages
                            continue_final_message = prompt_text.messages[-1]["role"] == "assistant"
                        if continue_final_message:
                            # With assistant prefill, concat onto the end of the last message
                            all_text = list(prompt_text.messages)[:-1] + [
                                {
                                    "role": prompt_text.messages[-1]["role"],
                                    "content": prompt_text.messages[-1]["content"] + all_text,
                                }
                            ]
                        else:
                            # When we're not starting from a prefill, the output is a new assistant message
                            all_text = list(prompt_text.messages) + [{"role": "assistant", "content": all_text}]
                record = {"generated_text": all_text}
                if "scores" in model_outputs:
                    scores = []
                    for scores_step in model_outputs["scores"]:
                        scores.append(scores_step[0][sequence_idx])
                    record["scores"] = tuple(scores)

                if "generated_sequences_scores" in model_outputs:
                    record["generated_sequences_scores"] = model_outputs["generated_sequences_scores"][0][sequence_idx]

            records.append(record)

        return records

PIPELINE_REGISTRY.register_pipeline(
    "text-generation-with-score",
    pipeline_class=CustomTextGenerationPipeline,
    pt_model=AutoModelForCausalLM,
)

to use it simply put it somewhere in a script or a cell and then you can use it like a normal pipeline:

pipe = pipeline(
"text-generation-with-score",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.float16,
device_map="auto",
)

However i don't know if this is the intended use for the TextGenerationPIpeline, if it is the code could be implemented for that purpose, let me know if it is worth it.

huggingface / transformers

Fails getting scores when using pipeline #33785