vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
27.79k stars 4.1k forks source link

[Usage]: how do I pass in the JSON content-type for Mistral 7B using offline inference? #7030

Closed RoopeHakulinen closed 2 months ago

RoopeHakulinen commented 2 months ago

Your current environment

The output of `python collect_env.py`

How would you like to use vllm

I would like to use the JSON mode for Mistral 7B while doing offline inference using the generate method as below. Is that possible somehow? Just using the prompt doesn't seem to produce JSON output as requested. If this is not possible, is the only solution to use something like Outlines? Would love some details on that, if so.

llm = LLM(model="mistralai/Mistral-7B-v0.3")
prompt = "Please name the biggest and smallest continent in JSON using the following schema: {biggest: <the biggest continent's name>, smallest: <the smallest continent>}"
sampling_params = SamplingParams(temperature=temperature, top_p=1.0)
response = self.llm.generate(prompt, sampling_params)
DarkLight1337 commented 2 months ago

This will become available to offline LLM by #6878.