huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.39k stars 27.09k forks source link

Request for Iterative Generation in Pipeline (e.g., LLaMA model) #33949

Open qsunyuan opened 1 month ago

qsunyuan commented 1 month ago

Feature request

I would like to ask if there is a way to perform iterative generation (n times) within the pipeline, specifically for models like LLMs. If this feature is not available, is there any plan to implement it in the future?

Example:

pipeline = transformers.pipeline(
            "text-generation",
            model="meta-llama/Llama-3.1-8B-Instruct",
            model_kwargs={"torch_dtype": torch.bfloat16},
            device_map="auto",
        ) 

# Generate once
outputs = llama_client(
              messages,
              max_new_tokens=max_tokens
          )
# Generate n times
outputs = llama_client(
              messages,
              max_new_tokens=max_tokens,
              n = n
          )

Similar GPT API

response = client.chat.completions.create(
            model=model,
            messages=messages, 
            max_tokens=max_tokens,
            temperature=temperature, 
            n=n,  
        )

I am also aware that iterative generation can be done using a for loop, but I am wondering if there is a more efficient or optimized way to generate multiple iterations (n times) within the pipeline for models.

https://community.openai.com/t/how-does-n-parameter-work-in-chat-completions/288725

Motivation

build connection between LLM api and transformer pipeline

Your contribution

Request

ArthurZucker commented 1 month ago

Sounds interesting, let's see if this is asked by the community ! We usually check activity here 🚀 cc @Rocketknight1

Rocketknight1 commented 1 month ago

Hmmn, a simple solution would be to replicate the input n times:

output = pipeline([input_chat] * n)

However, the text generation pipeline will only handle a single input at a time, so it's basically the same as using a for loop. We'd need to refactor the pipeline a lot to make this efficient, although you can do it efficiently with lower-level generate() calls I think!

August-murr commented 1 month ago

I am so utterly confused right now. isn't the solution just

pipeline([inputs],num_return_sequences=n)

or am I missing something?