vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
27.89k stars 4.11k forks source link

[Feature]: A instruction/chat method for offline LLM class. #3718

Closed simon-mo closed 1 month ago

simon-mo commented 6 months ago

🚀 The feature, motivation and pitch

We currently do not apply chat template for the offline LLM class. It might be useful to provide similar interface as Huggingface chat pipeline to utilize/active the instruction tuned capabilities.

https://huggingface.co/docs/transformers/en/chat_templating#is-there-an-automated-pipeline-for-chat

Alternatives

No response

Additional context

No response

Fedoration commented 6 months ago

Hello @simon-mo, is any update for this issue?

simon-mo commented 6 months ago

There has been no work on this issue. Contribution welcomed!

Fedoration commented 6 months ago

sure, but temporary i using like

from transformers import AutoTokenizer

llm = LLM(model=model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

messages = [
    {"role": "system", "content": "Some text..."},
    {"role": "user", "content": "Somer user text"},
]

sampling_params = SamplingParams(temperature=0.01, top_p=0.8, max_tokens=128)

prompt_token_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="np").tolist()
outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)

Is it correct?

nunjunj commented 5 months ago

I will be working on this!

yilunzhao commented 4 months ago

Thank you for the contribution! I was wondering if there have been any updates on this feature?