huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.31k stars 26.85k forks source link

Request for Iterative Generation in Pipeline (e.g., LLaMA model) #33949

Open qsunyuan opened 4 weeks ago

qsunyuan commented 4 weeks ago

Feature request

I would like to ask if there is a way to perform iterative generation (n times) within the pipeline, specifically for models like LLMs. If this feature is not available, is there any plan to implement it in the future?

Example:

pipeline = transformers.pipeline(
            "text-generation",
            model="meta-llama/Llama-3.1-8B-Instruct",
            model_kwargs={"torch_dtype": torch.bfloat16},
            device_map="auto",
        ) 

# Generate once
outputs = llama_client(
              messages,
              max_new_tokens=max_tokens
          )
# Generate n times
outputs = llama_client(
              messages,
              max_new_tokens=max_tokens,
              n = n
          )

Similar GPT API

response = client.chat.completions.create(
            model=model,
            messages=messages, 
            max_tokens=max_tokens,
            temperature=temperature, 
            n=n,  
        )

I am also aware that iterative generation can be done using a for loop, but I am wondering if there is a more efficient or optimized way to generate multiple iterations (n times) within the pipeline for models.

https://community.openai.com/t/how-does-n-parameter-work-in-chat-completions/288725

Motivation

build connection between LLM api and transformer pipeline

Your contribution

Request

ArthurZucker commented 3 weeks ago

Sounds interesting, let's see if this is asked by the community ! We usually check activity here 🚀 cc @Rocketknight1

Rocketknight1 commented 3 weeks ago

Hmmn, a simple solution would be to replicate the input n times:

output = pipeline([input_chat] * n)

However, the text generation pipeline will only handle a single input at a time, so it's basically the same as using a for loop. We'd need to refactor the pipeline a lot to make this efficient, although you can do it efficiently with lower-level generate() calls I think!

August-murr commented 3 weeks ago

I am so utterly confused right now. isn't the solution just

pipeline([inputs],num_return_sequences=n)

or am I missing something?