Closed matbee-eth closed 1 week ago
It looks like outlines may have updated this code but the docker image does not use an up to date outlines version
Which is your version of outlines
?
Which is your version of
outlines
?
I'm not sure, it's the vllm-openai docker image
Here is how I'm testing it:
import requests
url = "http://localhost:8000/v1/chat/completions"
headers = {
"Content-Type": "application/json"
}
messages: list[dict[str, str]] = [
{"content": "You are a helpful AI assistant", "role": "system"},
{"content": "go to google", "role": "user"},
{"content": "Below I will present you a request. Before we begin addressing the request, please answer the following pre-survey to the best of your ability. Keep in mind that you are Ken Jennings-level with trivia, and Mensa-level with puzzles, so there should be a deep well to draw from.\n\nHere is the request:\n\ngo to google\n\nHere is the pre-survey:\n\n 1. Please list any specific facts or figures that are GIVEN in the request itself. It is possible that there are none.\n 2. Please list any facts that may need to be looked up, and WHERE SPECIFICALLY they might be found. In some cases, authoritative sources are mentioned in the request itself.\n 3. Please list any facts that may need to be derived (e.g., via logical deduction, simulation, or computation)\n 4. Please list any facts that are recalled from memory, hunches, well-reasoned guesses, etc.\n\nWhen answering this survey, keep in mind that \"facts\" will typically be specific names, dates, statistics, etc. Your answer should use headings:\n\n 1. GIVEN OR VERIFIED FACTS\n 2. FACTS TO LOOK UP\n 3. FACTS TO DERIVE\n 4. EDUCATED GUESSES\n\nDO NOT include any other headings or sections in your response. DO NOT list next steps or plans until asked to do so.\n", "role": "user"},
{"content": "Certainly! Here is the pre-survey response based on the request \"go to google\":\n\n### 1. GIVEN OR VERIFIED FACTS\n- The request is to \"go to google.\"\n\n### 2. FACTS TO LOOK UP\n- None. The request itself does not require any additional information to look up.\n\n### 3. FACTS TO DERIVE\n- None. The request is straightforward and does not require logical deduction or computation.\n\n### 4. EDUCATED GUESSES\n- The user likely intends to perform a web search using the Google search engine.\n- The user might be looking for specific information, general browsing, or accessing Google services such as Gmail, Google Maps, or Google Drive.", "role": "assistant"},
{"content": "Fantastic. To address this request we have assembled the following team:\n\nWebSurfer: A helpful assistant with access to a web browser. Ask them to perform web searches, open pages, and interact with content (e.g., clicking links, scrolling the viewport, etc., filling in form fields, etc.) It can also summarize the entire page, or answer questions based on the content of the page. It can also be asked to sleep and wait for pages to load, in cases where the pages seem to be taking a while to load.\nCoder: A helpful and general-purpose AI assistant that has strong language skills, Python skills, and Linux command line skills.\nExecutor: A agent for executing code\nfile_surfer: An agent that can handle local files.\n\n\nBased on the team composition, and known and unknown facts, please devise a short bullet-point plan for addressing the original request. Remember, there is no requirement to involve all team members -- a team member's particular expertise may not be needed for this task.", "role": "user"}
]
payload: dict[str, str | list[dict[str, str]] | dict[str, str] | bool] = {
"messages": messages,
"model": "mistralai/Pixtral-12B-2409",
"response_format": {"type": "json_object"},
"stream": False
}
response = requests.post(url, json=payload, headers=headers)
# Print response
print(response.status_code)
print(response.json())
I just remembered that Mistral tokenizer doesn't support outlines. https://github.com/vllm-project/vllm/issues/9359#issuecomment-2412840803
Please open an issue on their repo to request support.
Your current environment
The output of `python collect_env.py`
```sh docker run --runtime nvidia --gpus all \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --ipc=host -p 8000:8000 \ vllm/vllm-openai --model mistralai/Pixtral-12B-2409 --tensor-parallel-size 2 --max-model-len 32768 --max-seq-len-to-capture=32768 --kv-cache-dtype fp8 --tokenizer_mode "mistral" --limit-mm-per-prompt image=4 ```Model Input Dumps
🐛 Describe the bug
Using Outlines with Mistral tokenizer model fails as the tokenizer calls it
tokenizer.eos_token_id
nottokenizer.eos_token
Before submitting a new issue...