Closed drikster80 closed 1 month ago
I believe I was able to find a solution to this. It is related to OpenAI-Python #1454
Not sure why it works with fastapi 0.112.2 but fails in 0.113.0
Problem line:
https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py#L286
async def create_chat_completion(request: ChatCompletionRequest,
raw_request: Request):
Confirmed Fix:
async def create_chat_completion(request: Annotated[dict, ChatCompletionRequest],
raw_request: Request):
I'll make a PR on this and reference the issue. Can also add some try/catch with TypeAdapter validation, unless it's seen as unnecessary or impacts performance.
Quick guide to running the latest vllm-openai container, upgrading fastapi, and triggering the issue. Also includes instructions to quickly change to editable mode
Pre-requisites:
Download and start the latest vllm container:
docker run --gpus all -it --rm --network=host --ipc=host --entrypoint /bin/bash vllm/vllm-openai:latest
Show current fastapi version:
python3 -c "import fastapi; print(fastapi.__version__)"
0.112.2
Start server with small model
python3 -m vllm.entrypoints.openai.api_server --model facebook/opt-125m
POST to 'v1/chat/completions'
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "facebook/opt-125m",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"}
]
}'
Upgrade fastapi to 0.113.0 or higher
pip install --upgrade fastapi==0.113.0
Start openai-compatible api_server:
python3 -m vllm.entrypoints.openai.api_server --model facebook/opt-125m
From outside the container, attempt POST to 'v1/chat/completions':
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "facebook/opt-125m",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"}
]
}'
Start docker container as above.
Install devel packages for Nvidia
apt-get update && apt-get install -y --no-install-recommends libtinfo5 libncursesw5 \
cuda-cudart-dev-12-4=12.4.127-1 \
cuda-command-line-tools-12-4=12.4.1-1 \
cuda-minimal-build-12-4=12.4.1-1 \
cuda-libraries-dev-12-4=12.4.1-1 \
cuda-nvml-dev-12-4=12.4.127-1 \
cuda-nvprof-12-4=12.4.127-1 \
libnpp-dev-12-4=12.2.5.30-1 \
libcusparse-dev-12-4=12.3.1.170-1 \
libcublas-dev-12-4=12.4.5.8-1 \
libnccl2=2.21.5-1+cuda12.4 \
libnccl-dev=2.21.5-1+cuda12.4 \
cuda-nsight-compute-12-4=12.4.1-1
git clone https://github.com/vllm-project/vllm.git /vllm
cd /vllm
cp /usr/local/lib/python3.10/dist-packages/vllm/*.so /vllm/vllm
VLLM_USE_PRECOMPILED=1 pip install -e .
Run API server
python3 ./vllm/entrypoints/openai/api_server.py --model facebook/opt-125m
Example of an inference request:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "facebook/opt-125m",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"}
],
"chat_template": "{% if messages[0][\"role\"] == \"system\" %}{{ messages[0][\"content\"] }}\n{% endif %}{% for message in messages[1:] %}{% if message[\"role\"] == \"user\" %}Human: {{ message[\"content\"] }}\n{% elif message[\"role\"] == \"assistant\" %}Assistant: {{ message[\"content\"] }}\n{% endif %}{% endfor %}Assistant:",
"max_tokens": 100
}'
I resolved the issue by downgrading FastAPI to version 0.111.0:
pip install fastapi==0.111.0
For reference, I'm using vllm==0.6.0
.
A few things I noticed:
get_model_fields()
list comprehension)# ...
File "/usr/local/lib/python3.10/dist-packages/pydantic/_internal/_generate_schema.py", line 837, in match_type
return self._typed_dict_schema(obj, None)
This may be a red herring, but wondering if there's some weirdness with Required
or similar TypedDict
hints.
Anyway, smallest reproducible example:
$ pip install vllm==0.6.0 fastapi==0.113.0 pydantic==2.8.2
$ python
Python 3.11.4 (main, Nov 28 2023, 16:28:36) [Clang 15.0.0 (clang-1500.0.40.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from fastapi._compat import get_model_fields
>>> from vllm.entrypoints.openai.protocol import ChatCompletionRequest
INFO 09-10 02:16:23 importing.py:10] Triton not installed; certain GPU-related functions will not be available.
WARNING 09-10 02:16:23 _custom_ops.py:18] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
>>> get_model_fields(ChatCompletionRequest)
Traceback (most recent call last):
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 277, in _init_core_attrs
self._core_schema = _getattr_no_parents(self._type, '__pydantic_core_schema__')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 119, in _getattr_no_parents
raise AttributeError(attribute)
AttributeError: __pydantic_core_schema__
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/fastapi/_compat.py", line 283, in get_model_fields
return [
^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/fastapi/_compat.py", line 284, in <listcomp>
ModelField(field_info=field_info, name=name)
File "<string>", line 6, in __init__
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/fastapi/_compat.py", line 109, in __post_init__
self._type_adapter: TypeAdapter[Any] = TypeAdapter(
^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 264, in __init__
self._init_core_attrs(rebuild_mocks=False)
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 142, in wrapped
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 284, in _init_core_attrs
self._core_schema = _get_schema(self._type, config_wrapper, parent_depth=self._parent_depth)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/type_adapter.py", line 102, in _get_schema
schema = gen.generate_schema(type_)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
schema = self._generate_schema_inner(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 768, in _generate_schema_inner
return self._annotated_schema(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1822, in _annotated_schema
schema = self._apply_annotations(source_type, annotations)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1890, in _apply_annotations
schema = get_inner_schema(source_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
schema = self._handler(source_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1972, in new_handler
schema = metadata_get_schema(source, get_inner_schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1968, in <lambda>
lambda source, handler: handler(source)
^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_schema_generation_shared.py", line 83, in __call__
schema = self._handler(source_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1972, in new_handler
schema = metadata_get_schema(source, get_inner_schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_std_types_schema.py", line 316, in __get_pydantic_core_schema__
items_schema = handler.generate_schema(self.item_source_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_schema_generation_shared.py", line 97, in generate_schema
return self._generate_schema.generate_schema(source_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
schema = self._generate_schema_inner(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 789, in _generate_schema_inner
return self.match_type(obj)
^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 871, in match_type
return self._match_generic_type(obj, origin)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 895, in _match_generic_type
return self._union_schema(obj)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1207, in _union_schema
choices.append(self.generate_schema(arg))
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 512, in generate_schema
schema = self._generate_schema_inner(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 789, in _generate_schema_inner
return self.match_type(obj)
^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 837, in match_type
return self._typed_dict_schema(obj, None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_generate_schema.py", line 1309, in _typed_dict_schema
for field_name, annotation in get_type_hints_infer_globalns(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/envs/fastapi-vllm-repro/lib/python3.11/site-packages/pydantic/_internal/_fields.py", line 57, in get_type_hints_infer_globalns
return get_type_hints(obj, globalns=globalns, localns=localns, include_extras=include_extras)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/.pyenv/versions/3.11.4/lib/python3.11/typing.py", line 2336, in get_type_hints
value = _eval_type(value, base_globals, base_locals)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/.pyenv/versions/3.11.4/lib/python3.11/typing.py", line 371, in _eval_type
return t._evaluate(globalns, localns, recursive_guard)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pachewise/.pyenv/versions/3.11.4/lib/python3.11/typing.py", line 877, in _evaluate
eval(self.__forward_code__, globalns, localns),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<string>", line 1, in <module>
TypeError: 'pydantic_core._pydantic_core.PydanticUndefinedType' object is not subscriptable
>>>
Note that pydantic==2.9.0
does not have this issue.
$ pip install pydantic==2.9.0
$ python
Python 3.11.4 (main, Nov 28 2023, 16:28:36) [Clang 15.0.0 (clang-1500.0.40.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from fastapi._compat import get_model_fields
>>> from vllm.entrypoints.openai.protocol import ChatCompletionRequest
INFO 09-10 02:26:12 importing.py:10] Triton not installed; certain GPU-related functions will not be available.
WARNING 09-10 02:26:12 _custom_ops.py:18] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
>>> get_model_fields(ChatCompletionRequest)
[ModelField(field_info=FieldInfo(annotation=List[Union[ChatCompletionSystemMessageParam, ChatCompletionUserMessageParam, ChatCompletionAssistantMessageParam, ChatCompletionToolMessageParam, ChatCompletionFunctionMessageParam, CustomChatCompletionMessageParam]], required=True), name='messages', mode='validation'), ModelField(field_info=FieldInfo(annotation=str, required=True), name='model', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[float, NoneType], required=False, default=0.0), name='frequency_penalty', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[Dict[str, float], NoneType], required=False, default=None), name='logit_bias', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[bool, NoneType], required=False, default=False), name='logprobs', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=0), name='top_logprobs', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=None), name='max_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=1), name='n', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[float, NoneType], required=False, default=0.0), name='presence_penalty', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[ResponseFormat, NoneType], required=False, default=None), name='response_format', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=None, metadata=[Ge(ge=-9223372036854775808), Le(le=9223372036854775807)]), name='seed', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, List[str], NoneType], required=False, default_factory=list), name='stop', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[bool, NoneType], required=False, default=False), name='stream', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[StreamOptions, NoneType], required=False, default=None), name='stream_options', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[float, NoneType], required=False, default=0.7), name='temperature', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[float, NoneType], required=False, default=1.0), name='top_p', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[List[ChatCompletionToolsParam], NoneType], required=False, default=None), name='tools', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[Literal['none'], Literal['auto'], ChatCompletionNamedToolChoiceParam, NoneType], required=False, default='none'), name='tool_choice', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[bool, NoneType], required=False, default=False), name='parallel_tool_calls', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None), name='user', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=None), name='best_of', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False), name='use_beam_search', mode='validation'), ModelField(field_info=FieldInfo(annotation=int, required=False, default=-1), name='top_k', mode='validation'), ModelField(field_info=FieldInfo(annotation=float, required=False, default=0.0), name='min_p', mode='validation'), ModelField(field_info=FieldInfo(annotation=float, required=False, default=1.0), name='repetition_penalty', mode='validation'), ModelField(field_info=FieldInfo(annotation=float, required=False, default=1.0), name='length_penalty', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False), name='early_stopping', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[List[int], NoneType], required=False, default_factory=list), name='stop_token_ids', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False), name='include_stop_str_in_output', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False), name='ignore_eos', mode='validation'), ModelField(field_info=FieldInfo(annotation=int, required=False, default=0), name='min_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=True), name='skip_special_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=True), name='spaces_between_special_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=1)])], NoneType], required=False, default=None), name='truncate_prompt_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[int, NoneType], required=False, default=None), name='prompt_logprobs', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False, description='If true, the new message will be prepended with the last message if they belong to the same role.'), name='echo', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=True, description='If true, the generation prompt will be added to the chat template. This is a parameter used by chat template in tokenizer config of the model.'), name='add_generation_prompt', mode='validation'), ModelField(field_info=FieldInfo(annotation=bool, required=False, default=False, description='If true, special tokens (e.g. BOS) will be added to the prompt on top of what is added by the chat template. For most models, the chat template takes care of adding the special tokens so this should be set to false (as is the default).'), name='add_special_tokens', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[List[Dict[str, str]], NoneType], required=False, default=None, description='A list of dicts representing documents that will be accessible to the model if it is performing RAG (retrieval-augmented generation). If the template does not support RAG, this argument will have no effect. We recommend that each document should be a dict containing "title" and "text" keys.'), name='documents', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='A Jinja template to use for this conversion. As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.'), name='chat_template', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[Dict[str, Any], NoneType], required=False, default=None, description='Additional kwargs to pass to the template renderer. Will be accessible by the chat template.'), name='chat_template_kwargs', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, dict, BaseModel, NoneType], required=False, default=None, description='If specified, the output will follow the JSON schema.'), name='guided_json', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='If specified, the output will follow the regex pattern.'), name='guided_regex', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[List[str], NoneType], required=False, default=None, description='If specified, the output will be exactly one of the choices.'), name='guided_choice', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='If specified, the output will follow the context free grammar.'), name='guided_grammar', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description="If specified, will override the default guided decoding backend of the server for this specific request. If set, must be either 'outlines' / 'lm-format-enforcer'"), name='guided_decoding_backend', mode='validation'), ModelField(field_info=FieldInfo(annotation=Union[str, NoneType], required=False, default=None, description='If specified, will override the default whitespace pattern for guided json decoding.'), name='guided_whitespace_pattern', mode='validation')]
>>>
This makes me feel like this is a pydantic issue? Or at least a confluence of factors across openai / pydantic / fastapi.
Checking @pachewise's code, I was able to reduce the error reproduction to:
from typing_extensions import Annotated
from typing import List
from vllm.entrypoints.chat_utils import (
ChatCompletionMessageParam,
)
from vllm.entrypoints.openai.protocol import ChatCompletionRequest
from pydantic import TypeAdapter
for name, field in ChatCompletionRequest.model_fields.items():
print(name, field)
TypeAdapter(Annotated[List[ChatCompletionMessageParam], field])
That doesn't use FastAPI, it's just Pydantic. And indeed, it's fixed by upgrading Pydantic to 2.9.0. :tada:
It wasn't breaking in FastAPI before because the logic before 0.113.0 wasn't using TypeAdapter
yet in that part of the code, and it seems that in the previous version of Pydantic there was a bug in it (not sure where, but it's already solved in 2.9.0).
Glad that it's resolved! Does the issue still occur in FastAPI 0.113.1 with Pydantic 2.8? If so, we may have to update either fastapi
or pydantic
in our dependencies to make sure that users doesn't install the faulty versions.
@DarkLight1337 yes, I'd recommend fastapi >= 0.114.1
(to fix a performance issue related to this part of their code) and pydantic >= 2.9.0
(to fix the actual issue we're seeing here).
Unfortunately the fastapi bump has broken Ray 2.9 compatibility.
$ pip install vllm==0.6.1.post2 'ray[serve]==2.9.3'
... snip...
The conflict is caused by:
vllm 0.6.1.post2 depends on fastapi>=0.114.1; python_version >= "3.9"
ray[serve] 2.9.3 depends on fastapi<=0.108.0; extra == "serve"
I've prepped a fix for the Ray 2.9 regression introduced in a different PR, but it won't really help unless we address the fastapi pin here as well.
Can we lower the fastapi pinned version, since it wasn't actually the cause of the issue, so we maintain the Ray 2.9 compatibility?
Unfortunately the fastapi bump has broken Ray 2.9 compatibility.
$ pip install vllm==0.6.1.post2 'ray[serve]==2.9.3' ... snip... The conflict is caused by: vllm 0.6.1.post2 depends on fastapi>=0.114.1; python_version >= "3.9" ray[serve] 2.9.3 depends on fastapi<=0.108.0; extra == "serve"
I've prepped a fix for the Ray 2.9 regression introduced in a different PR, but it won't really help unless we address the fastapi pin here as well.
Can we lower the fastapi pinned version, since it wasn't actually the cause of the issue, so we maintain the Ray 2.9 compatibility?
On it!
Your current environment
The output of `python collect_env.py`
```text Collecting environment information... WARNING 09-05 21:11:49 cuda.py:22] You are using a deprecated `pynvml` package. Please install `nvidia-ml-py` instead, and make sure to uninstall `pynvml`. When both of them are installed, `pynvml` will take precedence and cause errors. See https://pypi.org/project/pynvml for more information. WARNING 09-05 21:11:49 _custom_ops.py:18] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'") /vllm/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash: No module named 'vllm.commit_id' from vllm.version import __version__ as VLLM_VERSION PyTorch version: 2.4.0a0+3bcc3cddb5.nv24.07 Is debug build: False CUDA used to build PyTorch: 12.5 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (aarch64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: Could not collect CMake version: version 3.30.0 Libc version: glibc-2.35 Python version: 3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0] (64-bit runtime) Python platform: Linux-6.5.0-1024-nvidia-64k-aarch64-with-glibc2.35 Is CUDA available: True CUDA runtime version: 12.5.82 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GH200 480GB Nvidia driver version: 560.35.03 cuDNN version: Probably one of the following: /usr/lib/aarch64-linux-gnu/libcudnn.so.9.2.1 /usr/lib/aarch64-linux-gnu/libcudnn_adv.so.9.2.1 /usr/lib/aarch64-linux-gnu/libcudnn_cnn.so.9.2.1 /usr/lib/aarch64-linux-gnu/libcudnn_engines_precompiled.so.9.2.1 /usr/lib/aarch64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.2.1 /usr/lib/aarch64-linux-gnu/libcudnn_graph.so.9.2.1 /usr/lib/aarch64-linux-gnu/libcudnn_heuristic.so.9.2.1 /usr/lib/aarch64-linux-gnu/libcudnn_ops.so.9.2.1 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Architecture: aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 72 On-line CPU(s) list: 0-71 Vendor ID: ARM Model name: Neoverse-V2 Model: 0 Thread(s) per core: 1 Core(s) per socket: 72 Socket(s): 1 Stepping: r0p0 Frequency boost: disabled CPU max MHz: 3492.0000 CPU min MHz: 81.0000 BogoMIPS: 2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh bti L1d cache: 4.5 MiB (72 instances) L1i cache: 4.5 MiB (72 instances) L2 cache: 72 MiB (72 instances) L3 cache: 114 MiB (1 instance) NUMA node(s): 9 NUMA node0 CPU(s): 0-71 NUMA node1 CPU(s): NUMA node2 CPU(s): NUMA node3 CPU(s): NUMA node4 CPU(s): NUMA node5 CPU(s): NUMA node6 CPU(s): NUMA node7 CPU(s): NUMA node8 CPU(s): Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; __user pointer sanitization Vulnerability Spectre v2: Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Versions of relevant libraries: [pip3] numpy==1.24.4 [pip3] nvidia-cudnn-frontend==1.5.1 [pip3] nvidia-dali-cuda120==1.39.0 [pip3] nvidia-ml-py==12.560.30 [pip3] nvidia-modelopt==0.13.0 [pip3] nvidia-nvimgcodec-cu12==0.2.0.7 [pip3] nvidia-pyindex==1.0.9 [pip3] onnx==1.16.0 [pip3] optree==0.12.1 [pip3] pynvml==11.4.1 [pip3] pytorch-triton==3.0.0+989adb9a2 [pip3] pyzmq==26.0.3 [pip3] torch==2.4.0a0+3bcc3cddb5.nv24.7 [pip3] torch-tensorrt==2.5.0a0 [pip3] torchvision==0.19.0a0 [pip3] transformers==4.44.2 [pip3] triton==3.0.0 [conda] Could not collect ROCM Version: Could not collect Neuron SDK Version: N/A vLLM Version: 0.6.0@COMMIT_HASH_PLACEHOLDER vLLM Build Flags: CUDA Archs: 9.0+PTX; ROCm: Disabled; Neuron: Disabled GPU Topology: GPU0 NIC0 NIC1 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X NODE NODE 0-71 0 1 NIC0 NODE X PIX NIC1 NODE PIX X Legend: X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks NIC Legend: NIC0: mlx5_0 NIC1: mlx5_1 ```🐛 Describe the bug
FastAPI released 0.113.0 about 5 hours ago. This release has a major refactor of Pydantic support. It appears this causes a Pydantic failure with the OpenAI-API calling.
Confirmed that reverting to FastAPI 0.112.2 resolves the problem (
pip install fastapi==0.112.2
).Here are logs on the failure:
Before submitting a new issue...