vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.84k stars 4.69k forks source link

testing outlines suports with ROCM #3280

Open wizd opened 8 months ago

wizd commented 8 months ago

requirements-rocm.txt missing outlines:

outlines >= 0.0.27

after adding outlines, running openai api server has error:

root@dualamd:/app# python -m vllm.entrypoints.openai.api_server \
> --gpu-memory-utilization 0.98 --trust-remote-code \
>     --served-model-name gpt-3.5-turbo-1106 \
>     --max-model-len 32768 --model Qwen1.5-14B
Traceback (most recent call last):
  File "/opt/conda/envs/py_3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/envs/py_3.9/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.3.3+rocm603-py3.9-linux-x86_64.egg/vllm/entrypoints/openai/api_server.py", line 23, in <module>
    from vllm.entrypoints.openai.serving_chat import OpenAIServingChat
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.3.3+rocm603-py3.9-linux-x86_64.egg/vllm/entrypoints/openai/serving_chat.py", line 15, in <module>
    from vllm.model_executor.guided_decoding import get_guided_decoding_logits_processor
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.3.3+rocm603-py3.9-linux-x86_64.egg/vllm/model_executor/guided_decoding.py", line 12, in <module>
    from vllm.model_executor.guided_logits_processors import JSONLogitsProcessor, RegexLogitsProcessor
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/vllm-0.3.3+rocm603-py3.9-linux-x86_64.egg/vllm/model_executor/guided_logits_processors.py", line 23, in <module>
    from outlines.fsm.fsm import RegexFSM
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/outlines/__init__.py", line 2, in <module>
    import outlines.generate
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/outlines/generate/__init__.py", line 1, in <module>
    from .api import SequenceGenerator
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/outlines/generate/api.py", line 5, in <module>
    from outlines.fsm.fsm import FSMState
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/outlines/fsm/fsm.py", line 9, in <module>
    from outlines.fsm.regex import create_fsm_index_tokenizer, make_deterministic_fsm
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/outlines/fsm/regex.py", line 5, in <module>
    import numba
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/numba/__init__.py", line 43, in <module>
    from numba.np.ufunc import (vectorize, guvectorize, threading_layer,
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/numba/np/ufunc/__init__.py", line 3, in <module>
    from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/numba/np/ufunc/decorators.py", line 3, in <module>
    from numba.np.ufunc import _internal
SystemError: initialization of _internal failed without raising an exception

it seems numba only supports CUDA but not ROCM.

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!