The output of `python collect_env.py`
```text
ollecting environment information...
/opt/conda/lib/python3.11/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
No module named 'vllm._version'
from vllm.version import __version__ as VLLM_VERSION
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35
Python version: 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:53:32) [GCC 12.3.0] (64-bit runtime)
Python platform: Linux-6.1.109-118.189.amzn2023.x86_64-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A10G
Nvidia driver version: 560.35.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: AuthenticAMD
Model name: AMD EPYC 7R32
CPU family: 23
Model: 49
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
Stepping: 0
BogoMIPS: 5599.99
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save rdpid
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 128 KiB (4 instances)
L1i cache: 128 KiB (4 instances)
L2 cache: 2 MiB (4 instances)
L3 cache: 16 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-7
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec rstack overflow: Mitigation; safe RET
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.1.3.1
[pip3] nvidia-cuda-cupti-cu12==12.1.105
[pip3] nvidia-cuda-nvrtc-cu12==12.1.105
[pip3] nvidia-cuda-runtime-cu12==12.1.105
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.0.2.54
[pip3] nvidia-curand-cu12==10.3.2.106
[pip3] nvidia-cusolver-cu12==11.4.5.107
[pip3] nvidia-cusparse-cu12==12.1.0.106
[pip3] nvidia-ml-py==12.560.30
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] nvidia-nvjitlink-cu12==12.4.99
[pip3] nvidia-nvtx-cu12==12.1.105
[pip3] pyzmq==25.1.2
[pip3] torch==2.4.0
[pip3] torchaudio==2.2.1+cu121
[pip3] torchvision==0.19.0
[pip3] transformers==4.45.2
[pip3] triton==3.0.0
[conda] nomkl 1.0 h5ca1d4c_0 conda-forge
[conda] numpy 1.26.4 py311h64a7726_0 conda-forge
[conda] nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi
[conda] nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi
[conda] nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi
[conda] nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi
[conda] nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
[conda] nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi
[conda] nvidia-curand-cu12 10.3.2.106 pypi_0 pypi
[conda] nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi
[conda] nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi
[conda] nvidia-ml-py 12.560.30 pypi_0 pypi
[conda] nvidia-nccl-cu12 2.20.5 pypi_0 pypi
[conda] nvidia-nvjitlink-cu12 12.4.99 pypi_0 pypi
[conda] nvidia-nvtx-cu12 12.1.105 pypi_0 pypi
[conda] pyzmq 25.1.2 py311h34ded2d_0 conda-forge
[conda] torch 2.4.0 pypi_0 pypi
[conda] torchaudio 2.2.1+cu121 pypi_0 pypi
[conda] torchvision 0.19.0 pypi_0 pypi
[conda] transformers 4.45.2 pypi_0 pypi
[conda] triton 3.0.0 pypi_0 pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A (dev)
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X 0-7 0 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
```
Model Input Dumps
No response
🐛 Describe the bug
After upgrading from version 0.6.2 to 0.6.3 I started getting a validation error while generating structured input.
To reproduce:
vllm serve NousResearch/Meta-Llama-3-8B-Instruct --dtype auto
Execute the following code. In my case, I do it from a Jupyter Notebook:
#### OUTPUT DEFINITION
from pydantic import BaseModel, Field
from enum import Enum
from typing import List
from typing import Optional
import json
from openai import OpenAI
class BedType(Enum):
Twin = "Twin"
Double = "Double"
Queen = "Queen"
King = "King"
class RoomBeds(BaseModel):
bed_type: BedType = Field(...,description="Type of the bed in the hotel room")
quantity: int = Field(...,description="Number of beds of the given bed type within the hotel room")
class HotelRoom(BaseModel):
"""
Represents a hotel room.
"""
room_id: str = Field(...,description="Id of the room from the input")
room_name: Optional[str] = Field(...,description="Freetext name of the hotel room")
room_class: Optional[str] = Field(..., description="Room class of the hotel room.")
bed_types: Optional[List[RoomBeds]] = Field(..., description="List of beds within the hotel room.")
smoking_allowed: Optional[bool] = Field(..., description="Flag that indicates whether smoking is allowed or not in the hotel room. Unknown value used if it cannot be infered from the room description")
class Hotel(BaseModel):
"""
Represents an entry about a hotel.
"""
hotel_rooms: List[HotelRoom] = Field(..., description="List of hotel rooms within a hotel")
#### ONLINE INFERENCE
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="token-abc123",
)
completion = client.beta.chat.completions.parse(
seed=42,
model= "NousResearch/Meta-Llama-3-8B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Generate synthetic data for a fictitious hotel." },
],
temperature=0.8,
top_p=0.95,
response_format=Hotel
)
With version 0.6.2 I was always getting a structured output with the specified format. However, after upgrading to 0.6.3 I get a validation error as it seems the response does not match the expected format:
Cell In[10], line 1
----> 1 completion = client.beta.chat.completions.parse(
2 seed=42,
3 model= "NousResearch/Meta-Llama-3-8B-Instruct", # "NousResearch/Meta-Llama-3-8B-Instruct", #Hermes-2-Pro-Llama-3-8B-GGUF
4 messages=[
5 {"role": "system", "content": "You are a helpful assistant"},
6 {"role": "user", "content": "Generate synthetic data for a fictitious hotel." },
7 ],
8 temperature=0.8,
9 top_p=0.95,
10 response_format=Hotel
11 )
File /opt/conda/lib/python3.11/site-packages/openai/resources/beta/chat/completions.py:150, in Completions.parse(self, messages, model, response_format, frequency_penalty, function_call, functions, logit_bias, logprobs, max_completion_tokens, max_tokens, metadata, n, parallel_tool_calls, presence_penalty, seed, service_tier, stop, store, stream_options, temperature, tool_choice, tools, top_logprobs, top_p, user, extra_headers, extra_query, extra_body, timeout)
143 def parser(raw_completion: ChatCompletion) -> ParsedChatCompletion[ResponseFormatT]:
144 return _parse_chat_completion(
145 response_format=response_format,
146 chat_completion=raw_completion,
147 input_tools=tools,
148 )
--> 150 return self._post(
151 "/chat/completions",
152 body=maybe_transform(
153 {
154 "messages": messages,
155 "model": model,
156 "frequency_penalty": frequency_penalty,
157 "function_call": function_call,
158 "functions": functions,
159 "logit_bias": logit_bias,
160 "logprobs": logprobs,
161 "max_completion_tokens": max_completion_tokens,
162 "max_tokens": max_tokens,
163 "metadata": metadata,
164 "n": n,
165 "parallel_tool_calls": parallel_tool_calls,
166 "presence_penalty": presence_penalty,
167 "response_format": _type_to_response_format(response_format),
168 "seed": seed,
169 "service_tier": service_tier,
170 "stop": stop,
171 "store": store,
172 "stream": False,
173 "stream_options": stream_options,
174 "temperature": temperature,
175 "tool_choice": tool_choice,
176 "tools": tools,
177 "top_logprobs": top_logprobs,
178 "top_p": top_p,
179 "user": user,
180 },
181 completion_create_params.CompletionCreateParams,
182 ),
183 options=make_request_options(
184 extra_headers=extra_headers,
185 extra_query=extra_query,
186 extra_body=extra_body,
187 timeout=timeout,
188 post_parser=parser,
189 ),
190 # we turn the `ChatCompletion` instance into a `ParsedChatCompletion`
191 # in the `parser` function above
192 cast_to=cast(Type[ParsedChatCompletion[ResponseFormatT]], ChatCompletion),
193 stream=False,
194 )
File /opt/conda/lib/python3.11/site-packages/openai/_base_client.py:1277, in SyncAPIClient.post(self, path, cast_to, body, options, files, stream, stream_cls)
1263 def post(
1264 self,
1265 path: str,
(...)
1272 stream_cls: type[_StreamT] | None = None,
1273 ) -> ResponseT | _StreamT:
1274 opts = FinalRequestOptions.construct(
1275 method="post", url=path, json_data=body, files=to_httpx_files(files), **options
1276 )
-> 1277 return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File /opt/conda/lib/python3.11/site-packages/openai/_base_client.py:954, in SyncAPIClient.request(self, cast_to, options, remaining_retries, stream, stream_cls)
951 else:
952 retries_taken = 0
--> 954 return self._request(
955 cast_to=cast_to,
956 options=options,
957 stream=stream,
958 stream_cls=stream_cls,
959 retries_taken=retries_taken,
960 )
File /opt/conda/lib/python3.11/site-packages/openai/_base_client.py:1060, in SyncAPIClient._request(self, cast_to, options, retries_taken, stream, stream_cls)
1057 log.debug("Re-raising status error")
1058 raise self._make_status_error_from_response(err.response) from None
-> 1060 return self._process_response(
1061 cast_to=cast_to,
1062 options=options,
1063 response=response,
1064 stream=stream,
1065 stream_cls=stream_cls,
1066 retries_taken=retries_taken,
1067 )
File /opt/conda/lib/python3.11/site-packages/openai/_base_client.py:1159, in SyncAPIClient._process_response(self, cast_to, options, response, stream, stream_cls, retries_taken)
1156 if bool(response.request.headers.get(RAW_RESPONSE_HEADER)):
1157 return cast(ResponseT, api_response)
-> 1159 return api_response.parse()
File /opt/conda/lib/python3.11/site-packages/openai/_response.py:319, in APIResponse.parse(self, to)
317 parsed = self._parse(to=to)
318 if is_given(self._options.post_parser):
--> 319 parsed = self._options.post_parser(parsed)
321 if isinstance(parsed, BaseModel):
322 add_request_id(parsed, self.request_id)
File /opt/conda/lib/python3.11/site-packages/openai/resources/beta/chat/completions.py:144, in Completions.parse.<locals>.parser(raw_completion)
143 def parser(raw_completion: ChatCompletion) -> ParsedChatCompletion[ResponseFormatT]:
--> 144 return _parse_chat_completion(
145 response_format=response_format,
146 chat_completion=raw_completion,
147 input_tools=tools,
148 )
File /opt/conda/lib/python3.11/site-packages/openai/lib/_parsing/_completions.py:110, in parse_chat_completion(response_format, input_tools, chat_completion)
100 else:
101 tool_calls.append(tool_call)
103 choices.append(
104 construct_type_unchecked(
105 type_=cast(Any, ParsedChoice)[solve_response_format_t(response_format)],
106 value={
107 **choice.to_dict(),
108 "message": {
109 **message.to_dict(),
--> 110 "parsed": maybe_parse_content(
111 response_format=response_format,
112 message=message,
113 ),
114 "tool_calls": tool_calls,
115 },
116 },
117 )
118 )
120 return cast(
121 ParsedChatCompletion[ResponseFormatT],
122 construct_type_unchecked(
(...)
128 ),
129 )
File /opt/conda/lib/python3.11/site-packages/openai/lib/_parsing/_completions.py:161, in maybe_parse_content(response_format, message)
155 def maybe_parse_content(
156 *,
157 response_format: type[ResponseFormatT] | ResponseFormatParam | NotGiven,
158 message: ChatCompletionMessage | ParsedChatCompletionMessage[object],
159 ) -> ResponseFormatT | None:
160 if has_rich_response_format(response_format) and message.content is not None and not message.refusal:
--> 161 return _parse_content(response_format, message.content)
163 return None
File /opt/conda/lib/python3.11/site-packages/openai/lib/_parsing/_completions.py:221, in _parse_content(response_format, content)
219 def _parse_content(response_format: type[ResponseFormatT], content: str) -> ResponseFormatT:
220 if is_basemodel_type(response_format):
--> 221 return cast(ResponseFormatT, model_parse_json(response_format, content))
223 if is_dataclass_like_type(response_format):
224 if not PYDANTIC_V2:
File /opt/conda/lib/python3.11/site-packages/openai/_compat.py:166, in model_parse_json(model, data)
164 def model_parse_json(model: type[_ModelT], data: str | bytes) -> _ModelT:
165 if PYDANTIC_V2:
--> 166 return model.model_validate_json(data)
167 return model.parse_raw(data)
File /opt/conda/lib/python3.11/site-packages/pydantic/main.py:625, in BaseModel.model_validate_json(cls, json_data, strict, context)
623 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
624 __tracebackhide__ = True
--> 625 return cls.__pydantic_validator__.validate_json(json_data, strict=strict, context=context)
ValidationError: 1 validation error for Hotel
Invalid JSON: expected ident at line 1 column 2 [type=json_invalid, input_value='I\'d be happy to help ge... requests or questions.', input_type=str]
For further information visit https://errors.pydantic.dev/2.9/v/json_invalid```
### Before submitting a new issue...
- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.
Your current environment
The output of `python collect_env.py`
```text ollecting environment information... /opt/conda/lib/python3.11/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash: No module named 'vllm._version' from vllm.version import __version__ as VLLM_VERSION PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.35 Python version: 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:53:32) [GCC 12.3.0] (64-bit runtime) Python platform: Linux-6.1.109-118.189.amzn2023.x86_64-x86_64-with-glibc2.35 Is CUDA available: True CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA A10G Nvidia driver version: 560.35.03 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: AuthenticAMD Model name: AMD EPYC 7R32 CPU family: 23 Model: 49 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 Stepping: 0 BogoMIPS: 5599.99 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save rdpid Hypervisor vendor: KVM Virtualization type: full L1d cache: 128 KiB (4 instances) L1i cache: 128 KiB (4 instances) L2 cache: 2 MiB (4 instances) L3 cache: 16 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-7 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection Vulnerability Spec rstack overflow: Mitigation; safe RET Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Versions of relevant libraries: [pip3] numpy==1.26.4 [pip3] nvidia-cublas-cu12==12.1.3.1 [pip3] nvidia-cuda-cupti-cu12==12.1.105 [pip3] nvidia-cuda-nvrtc-cu12==12.1.105 [pip3] nvidia-cuda-runtime-cu12==12.1.105 [pip3] nvidia-cudnn-cu12==9.1.0.70 [pip3] nvidia-cufft-cu12==11.0.2.54 [pip3] nvidia-curand-cu12==10.3.2.106 [pip3] nvidia-cusolver-cu12==11.4.5.107 [pip3] nvidia-cusparse-cu12==12.1.0.106 [pip3] nvidia-ml-py==12.560.30 [pip3] nvidia-nccl-cu12==2.20.5 [pip3] nvidia-nvjitlink-cu12==12.4.99 [pip3] nvidia-nvtx-cu12==12.1.105 [pip3] pyzmq==25.1.2 [pip3] torch==2.4.0 [pip3] torchaudio==2.2.1+cu121 [pip3] torchvision==0.19.0 [pip3] transformers==4.45.2 [pip3] triton==3.0.0 [conda] nomkl 1.0 h5ca1d4c_0 conda-forge [conda] numpy 1.26.4 py311h64a7726_0 conda-forge [conda] nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi [conda] nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi [conda] nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi [conda] nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi [conda] nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi [conda] nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi [conda] nvidia-curand-cu12 10.3.2.106 pypi_0 pypi [conda] nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi [conda] nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi [conda] nvidia-ml-py 12.560.30 pypi_0 pypi [conda] nvidia-nccl-cu12 2.20.5 pypi_0 pypi [conda] nvidia-nvjitlink-cu12 12.4.99 pypi_0 pypi [conda] nvidia-nvtx-cu12 12.1.105 pypi_0 pypi [conda] pyzmq 25.1.2 py311h34ded2d_0 conda-forge [conda] torch 2.4.0 pypi_0 pypi [conda] torchaudio 2.2.1+cu121 pypi_0 pypi [conda] torchvision 0.19.0 pypi_0 pypi [conda] transformers 4.45.2 pypi_0 pypi [conda] triton 3.0.0 pypi_0 pypi ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: N/A (dev) vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled GPU Topology: GPU0 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X 0-7 0 N/A Legend: X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks ```Model Input Dumps
No response
🐛 Describe the bug
After upgrading from version 0.6.2 to 0.6.3 I started getting a validation error while generating structured input.
To reproduce:
With version 0.6.2 I was always getting a structured output with the specified format. However, after upgrading to 0.6.3 I get a validation error as it seems the response does not match the expected format: