shell-nlp / gpt_server

gpt_server是一个用于生产级部署LLMs或Embedding的开源框架。
Apache License 2.0
95 stars 11 forks source link

docker容器运行Qwen2.5-7b报错:TextEncodeInput must be Union #15

Open wzs1566 opened 4 hours ago

wzs1566 commented 4 hours ago

感谢开发者的辛勤付出!!!

我在构建docker容器运行Qwen2.5-7b时,遇到一些错误。 错误信息如下:

==========
== CUDA ==
==========

CUDA Version 12.2.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

/workspace/gpt_server
2024-09-21 07:05:37.625 | DEBUG    | gpt_server.utils:delete_log:177 - logs_path: /workspace/logs
2024-09-21 07:05:37.654 | INFO     | gpt_server.utils:run_cmd:14 - 执行命令如下:
python -m fastchat.serve.controller --host 0.0.0.0 --port 21001 --dispatch-method shortest_queue 

2024-09-21 07:05:37.656 | INFO     | gpt_server.utils:run_cmd:14 - 执行命令如下:
python -m gpt_server.serving.openai_api_server --host 0.0.0.0 --port 8082 --controller-address http://localhost:21001

2024-09-21 07:05:37.671 | INFO     | gpt_server.utils:run_cmd:14 - 执行命令如下:
CUDA_VISIBLE_DEVICES=0 python -m gpt_server.model_worker.qwen --num_gpus 1 --model_name_or_path /workspace/model --model_names qwen25-7b,qwen2.5-7b --backend lmdeploy-turbomind --host 0.0.0.0 --controller_address http://localhost:21001

2024-09-21 07:05:37 | INFO | controller | args: Namespace(host='0.0.0.0', port=21001, dispatch_method='shortest_queue', ssl=False)
2024-09-21 07:05:37 | ERROR | stderr | INFO:     Started server process [37]
2024-09-21 07:05:37 | ERROR | stderr | INFO:     Waiting for application startup.
2024-09-21 07:05:37 | ERROR | stderr | INFO:     Application startup complete.
2024-09-21 07:05:38 | ERROR | stderr | INFO:     Uvicorn running on http://0.0.0.0:21001 (Press CTRL+C to quit)
2024-09-21 07:05:38 | INFO | openai_api_server | args: Namespace(host='0.0.0.0', port=8082, controller_address='http://localhost:21001', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_keys=None, ssl=False)
2024-09-21 07:05:38 | ERROR | stderr | INFO:     Started server process [41]
2024-09-21 07:05:38 | ERROR | stderr | INFO:     Waiting for application startup.
2024-09-21 07:05:38 | ERROR | stderr | INFO:     Application startup complete.
2024-09-21 07:05:38 | ERROR | stderr | INFO:     Uvicorn running on http://0.0.0.0:8082 (Press CTRL+C to quit)
INFO:     Started server process [44]
INFO:     Waiting for application startup.
2024-09-21 07:05:39.411 | INFO     | gpt_server.model_worker.base.model_worker_base:load_model_tokenizer:110 - QwenWorker 使用 LMDeploy 后端
2024-09-21 07:05:39.411 | INFO     | gpt_server.model_backend.lmdeploy_backend:__init__:30 - 后端 turbomind
2024-09-21 07:05:39.413 | INFO     | gpt_server.model_backend.lmdeploy_backend:__init__:36 - 模型架构:llm
2024-09-21 07:05:39,413 - lmdeploy - INFO - input backend=turbomind, backend_config=TurbomindEngineConfig(model_format=None, tp=1, session_len=None, max_batch_size=None, cache_max_entry_count=0.8, cache_chunk_size=-1, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1)
2024-09-21 07:05:39,413 - lmdeploy - INFO - input chat_template_config=None
2024-09-21 07:05:39,414 - lmdeploy - WARNING - Did not find a chat template matching /workspace/model.
2024-09-21 07:05:39,418 - lmdeploy - INFO - updated chat_template_onfig=ChatTemplateConfig(model_name='base', system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability=None, stop_words=None)
2024-09-21 07:05:39,428 - lmdeploy - INFO - model_source: hf_model
2024-09-21 07:05:39,663 - lmdeploy - INFO - turbomind model config:

{
  "model_config": {
    "model_name": "",
    "chat_template": "",
    "model_arch": "Qwen2ForCausalLM",
    "head_num": 28,
    "kv_head_num": 4,
    "hidden_units": 3584,
    "vocab_size": 152064,
    "num_layer": 28,
    "inter_size": 18944,
    "norm_eps": 1e-06,
    "attn_bias": 1,
    "start_id": 151643,
    "end_id": 151645,
    "size_per_head": 128,
    "group_size": 128,
    "weight_type": "bf16",
    "session_len": 32768,
    "tp": 1,
    "model_format": "hf"
  },
  "attention_config": {
    "rotary_embedding": 128,
    "rope_theta": 1000000.0,
    "max_position_embeddings": 32768,
    "original_max_position_embeddings": 0,
    "rope_scaling_type": "",
    "rope_scaling_factor": 0.0,
    "use_dynamic_ntk": 0,
    "low_freq_factor": 1.0,
    "high_freq_factor": 1.0,
    "use_logn_attn": 0,
    "cache_block_seq_len": 64
  },
  "lora_config": {
    "lora_policy": "",
    "lora_r": 0,
    "lora_scale": 0.0,
    "lora_max_wo_r": 0,
    "lora_rank_pattern": "",
    "lora_scale_pattern": ""
  },
  "engine_config": {
    "model_format": null,
    "tp": 1,
    "session_len": null,
    "max_batch_size": 128,
    "cache_max_entry_count": 0.8,
    "cache_chunk_size": -1,
    "cache_block_seq_len": 64,
    "enable_prefix_caching": false,
    "quant_policy": 0,
    "rope_scaling_factor": 0.0,
    "use_logn_attn": false,
    "download_dir": null,
    "revision": null,
    "max_prefill_token_num": 8192,
    "num_tokens_per_iter": 8192,
    "max_prefill_iters": 4
  }
}
[TM][WARNING] [LlamaTritonModel] `max_context_token_num` is not set, default to 32768.
[TM][INFO] Model: 
head_num: 28
kv_head_num: 4
size_per_head: 128
inter_size: 18944
num_layer: 28
vocab_size: 152064
attn_bias: 1
max_batch_size: 128
max_prefill_token_num: 8192
max_context_token_num: 32768
num_tokens_per_iter: 8192
max_prefill_iters: 4
session_len: 32768
cache_max_entry_count: 0.8
cache_block_seq_len: 64
cache_chunk_size: -1
enable_prefix_caching: 0
start_id: 151643
tensor_para_size: 1
pipeline_para_size: 1
enable_custom_all_reduce: 0
model_name: 
model_dir: 
quant_policy: 0
group_size: 128

2024-09-21 07:05:39,674 - lmdeploy - WARNING - get 255 model params
[TM][INFO] [LlamaWeight<T>::prepare] workspace size: 0                                                    

[WARNING] gemm_config.in is not found; using default GEMM algo
[TM][INFO] [BlockManager] block_size = 3 MB
[TM][INFO] [BlockManager] max_block_count = 2059
[TM][INFO] [BlockManager] chunk_size = 2059
[TM][INFO] LlamaBatch<T>::Start()
2024-09-21 07:05:41,701 - lmdeploy - INFO - updated backend_config=TurbomindEngineConfig(model_format=None, tp=1, session_len=None, max_batch_size=128, cache_max_entry_count=0.8, cache_chunk_size=-1, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=8192, max_prefill_iters=4)
2024-09-21 07:05:41.703 | INFO     | gpt_server.model_worker.base.model_worker_base:load_model_tokenizer:128 - load_model_tokenizer 完成
2024-09-21 07:05:41.704 | INFO     | gpt_server.model_worker.base.model_worker_base:get_context_length:74 - 模型配置:
2024-09-21 07:05:41.704 | INFO     | gpt_server.model_worker.base.model_worker_base:__init__:58 - Loading the model ['qwen25-7b', 'qwen2.5-7b'] on worker 877ee7c4 ...
2024-09-21 07:05:41.704 | INFO     | gpt_server.model_worker.base.base_model_worker:register_to_controller:93 - Register to controller
2024-09-21 07:05:41 | INFO | controller | Register a new worker: http://0.0.0.0:49063
2024-09-21 07:05:41 | INFO | controller | Register done: http://0.0.0.0:49063, {'model_names': ['qwen25-7b', 'qwen2.5-7b'], 'speed': 1, 'queue_length': 0}
2024-09-21 07:05:41 | INFO | stdout | INFO:     127.0.0.1:60786 - "POST /register_worker HTTP/1.1" 200 OK
2024-09-21 07:05:41.707 | INFO     | gpt_server.model_worker.base.model_worker_base:__init__:63 - worker 已赋值
2024-09-21 07:05:41.707 | INFO     | __main__:__init__:45 - qwen停用词: ['<|endoftext|>', '<|im_start|>', '<|im_end|>', 'Observation:']
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:49063 (Press CTRL+C to quit)
2024-09-21 07:05:44 | INFO | stdout | INFO:     127.0.0.1:60790 - "POST /list_models HTTP/1.1" 200 OK
2024-09-21 07:05:44 | INFO | controller | names: ['http://0.0.0.0:49063'], queue_lens: [0.0], ret: http://0.0.0.0:49063
2024-09-21 07:05:44 | INFO | stdout | INFO:     127.0.0.1:60792 - "POST /get_worker_address HTTP/1.1" 200 OK
INFO:     127.0.0.1:47984 - "POST /model_details HTTP/1.1" 200 OK
INFO:     127.0.0.1:47992 - "POST /count_token HTTP/1.1" 200 OK
2024-09-21 07:05:44 | INFO | stdout | INFO:     10.10.1.12:42476 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO:     127.0.0.1:47996 - "POST /worker_generate_stream HTTP/1.1" 200 OK
2024-09-21 07:05:44.826 | INFO     | __main__:generate_stream_gate:53 - params {'model': 'qwen25-7b', 'temperature': 0.7, 'logprobs': None, 'top_p': 1.0, 'top_k': -1, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'max_new_tokens': 1048576, 'echo': False, 'stop': [], 'messages': [{'role': 'user', 'content': 'Make a scatter plot with x_values 1, 2 and y_values 3, 4'}], 'tools': None, 'tool_choice': None, 'request_id': '1', 'request': <starlette.requests.Request object at 0x7fd3af5331f0>}
2024-09-21 07:05:44.826 | INFO     | __main__:generate_stream_gate:54 - worker_id: 877ee7c4
2024-09-21 07:05:44.827 | INFO     | __main__:generate_stream_gate:74 - 正在使用qwen-2.0 !
2024-09-21 07:05:44.849 | INFO     | gpt_server.model_backend.lmdeploy_backend:stream_chat:45 - <|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user
Make a scatter plot with x_values 1, 2 and y_values 3, 4<|im_end|>
<|im_start|>assistant

2024-09-21 07:05:44.849 | INFO     | gpt_server.model_backend.lmdeploy_backend:stream_chat:79 - request_id 1
2024-09-21 07:05:44,974 - lmdeploy - WARNING - The token Observation:, its length of indexes [37763, 367, 25] is over than 1. Currently, it can not be used as stop words
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 257, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 253, in wrap
    await func()
  File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 230, in listen_for_disconnect
    message = await receive()
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 587, in receive
    await self.message_event.wait()
  File "/usr/lib/python3.10/asyncio/locks.py", line 214, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fd3af5338b0

During handling of the above exception, another exception occurred:

  + Exception Group Traceback (most recent call last):
  |   File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
  |     result = await app(  # type: ignore[func-returns-value]
  |   File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
  |     return await self.app(scope, receive, send)
  |   File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
  |     await super().__call__(scope, receive, send)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 113, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 187, in __call__
  |     raise exc
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 165, in __call__
  |     await self.app(scope, receive, _send)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 62, in __call__
  |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 62, in wrapped_app
  |     raise exc
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 51, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 715, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 735, in app
  |     await route.handle(scope, receive, send)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 288, in handle
  |     await self.app(scope, receive, send)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 76, in app
  |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 62, in wrapped_app
  |     raise exc
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 51, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 74, in app
  |     await response(scope, receive, send)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 250, in __call__
  |     async with anyio.create_task_group() as task_group:
  |   File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 685, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 253, in wrap
    |     await func()
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 242, in stream_response
    |     async for chunk in self.body_iterator:
    |   File "/workspace/gpt_server/model_worker/qwen.py", line 91, in generate_stream_gate
    |     async for ret in self.backend.stream_chat(params=params):
    |   File "/workspace/gpt_server/model_backend/lmdeploy_backend.py", line 84, in stream_chat
    |     async for request_output in results_generator:
    |   File "/usr/local/lib/python3.10/dist-packages/lmdeploy/serve/async_engine.py", line 509, in generate
    |     prompt_input = await self._get_prompt_input(prompt,
    |   File "/usr/local/lib/python3.10/dist-packages/lmdeploy/serve/async_engine.py", line 453, in _get_prompt_input
    |     input_ids = self.tokenizer.encode(prompt, add_bos=sequence_start)
    |   File "/usr/local/lib/python3.10/dist-packages/lmdeploy/tokenizer.py", line 600, in encode
    |     return self.model.encode(s, add_bos, add_special_tokens, **kwargs)
    |   File "/usr/local/lib/python3.10/dist-packages/lmdeploy/tokenizer.py", line 366, in encode
    |     encoded = self.model.encode(s,
    |   File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2825, in encode
    |     encoded_inputs = self.encode_plus(
    |   File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 3237, in encode_plus
    |     return self._encode_plus(
    |   File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py", line 601, in _encode_plus
    |     batched_output = self._batch_encode_plus(
    |   File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py", line 528, in _batch_encode_plus
    |     encodings = self._tokenizer.encode_batch(
    | TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]
    +------------------------------------
2024-09-21 07:05:44 | ERROR | stderr | ERROR:    Exception in ASGI application
2024-09-21 07:05:44 | ERROR | stderr | Traceback (most recent call last):
2024-09-21 07:05:44 | ERROR | stderr |   File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 257, in __call__
2024-09-21 07:05:44 | ERROR | stderr |     await wrap(partial(self.listen_for_disconnect, receive))
2024-09-21 07:05:44 | ERROR | stderr |   File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 253, in wrap
2024-09-21 07:05:44 | ERROR | stderr |     await func()
2024-09-21 07:05:44 | ERROR | stderr |   File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 230, in listen_for_disconnect
2024-09-21 07:05:44 | ERROR | stderr |     message = await receive()
2024-09-21 07:05:44 | ERROR | stderr |   File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 587, in receive
2024-09-21 07:05:44 | ERROR | stderr |     await self.message_event.wait()
2024-09-21 07:05:44 | ERROR | stderr |   File "/usr/lib/python3.10/asyncio/locks.py", line 214, in wait
2024-09-21 07:05:44 | ERROR | stderr |     await fut
2024-09-21 07:05:44 | ERROR | stderr | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fec07695780
2024-09-21 07:05:44 | ERROR | stderr | 
2024-09-21 07:05:44 | ERROR | stderr | During handling of the above exception, another exception occurred:
2024-09-21 07:05:44 | ERROR | stderr | 
2024-09-21 07:05:44 | ERROR | stderr |   + Exception Group Traceback (most recent call last):
2024-09-21 07:05:44 | ERROR | stderr |   |   File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
2024-09-21 07:05:44 | ERROR | stderr |   |     result = await app(  # type: ignore[func-returns-value]
2024-09-21 07:05:44 | ERROR | stderr |   |   File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
2024-09-21 07:05:44 | ERROR | stderr |   |     return await self.app(scope, receive, send)
2024-09-21 07:05:44 | ERROR | stderr |   |   File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
2024-09-21 07:05:44 | ERROR | stderr |   |     await super().__call__(scope, receive, send)
2024-09-21 07:05:44 | ERROR | stderr |   |   File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 113, in __call__
2024-09-21 07:05:44 | ERROR | stderr |   |     await self.middleware_stack(scope, receive, send)
2024-09-21 07:05:44 | ERROR | stderr |   |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 187, in __call__
2024-09-21 07:05:44 | ERROR | stderr |   |     raise exc
2024-09-21 07:05:44 | ERROR | stderr |   |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 165, in __call__
2024-09-21 07:05:44 | ERROR | stderr |   |     await self.app(scope, receive, _send)
2024-09-21 07:05:44 | ERROR | stderr |   |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
2024-09-21 07:05:44 | ERROR | stderr |   |     await self.app(scope, receive, send)
2024-09-21 07:05:44 | ERROR | stderr |   |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 62, in __call__
2024-09-21 07:05:44 | ERROR | stderr |   |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
2024-09-21 07:05:44 | ERROR | stderr |   |   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 62, in wrapped_app
2024-09-21 07:05:44 | ERROR | stderr |   |     raise exc
2024-09-21 07:05:44 | ERROR | stderr |   |   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 51, in wrapped_app
2024-09-21 07:05:44 | ERROR | stderr |   |     await app(scope, receive, sender)
2024-09-21 07:05:44 | ERROR | stderr |   |   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 715, in __call__
2024-09-21 07:05:44 | ERROR | stderr |   |     await self.middleware_stack(scope, receive, send)
2024-09-21 07:05:44 | ERROR | stderr |   |   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 735, in app
2024-09-21 07:05:44 | ERROR | stderr |   |     await route.handle(scope, receive, send)
2024-09-21 07:05:44 | ERROR | stderr |   |   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 288, in handle
2024-09-21 07:05:44 | ERROR | stderr |   |     await self.app(scope, receive, send)
2024-09-21 07:05:44 | ERROR | stderr |   |   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 76, in app
2024-09-21 07:05:44 | ERROR | stderr |   |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
2024-09-21 07:05:44 | ERROR | stderr |   |   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 62, in wrapped_app
2024-09-21 07:05:44 | ERROR | stderr |   |     raise exc
2024-09-21 07:05:44 | ERROR | stderr |   |   File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 51, in wrapped_app
2024-09-21 07:05:44 | ERROR | stderr |   |     await app(scope, receive, sender)
2024-09-21 07:05:44 | ERROR | stderr |   |   File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 74, in app
2024-09-21 07:05:44 | ERROR | stderr |   |     await response(scope, receive, send)
2024-09-21 07:05:44 | ERROR | stderr |   |   File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 250, in __call__
2024-09-21 07:05:44 | ERROR | stderr |   |     async with anyio.create_task_group() as task_group:
2024-09-21 07:05:44 | ERROR | stderr |   |   File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 685, in __aexit__
2024-09-21 07:05:44 | ERROR | stderr |   |     raise BaseExceptionGroup(
2024-09-21 07:05:44 | ERROR | stderr |   | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
2024-09-21 07:05:44 | ERROR | stderr |   +-+---------------- 1 ----------------
2024-09-21 07:05:44 | ERROR | stderr |     | Traceback (most recent call last):
2024-09-21 07:05:44 | ERROR | stderr |     |   File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 72, in map_httpcore_exceptions
2024-09-21 07:05:44 | ERROR | stderr |     |     yield
2024-09-21 07:05:44 | ERROR | stderr |     |   File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 257, in __aiter__
2024-09-21 07:05:44 | ERROR | stderr |     |     async for part in self._httpcore_stream:
2024-09-21 07:05:44 | ERROR | stderr |     |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_async/connection_pool.py", line 367, in __aiter__
2024-09-21 07:05:44 | ERROR | stderr |     |     raise exc from None
2024-09-21 07:05:44 | ERROR | stderr |     |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_async/connection_pool.py", line 363, in __aiter__
2024-09-21 07:05:44 | ERROR | stderr |     |     async for part in self._stream:
2024-09-21 07:05:44 | ERROR | stderr |     |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_async/http11.py", line 349, in __aiter__
2024-09-21 07:05:44 | ERROR | stderr |     |     raise exc
2024-09-21 07:05:44 | ERROR | stderr |     |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_async/http11.py", line 341, in __aiter__
2024-09-21 07:05:44 | ERROR | stderr |     |     async for chunk in self._connection._receive_response_body(**kwargs):
2024-09-21 07:05:44 | ERROR | stderr |     |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_async/http11.py", line 210, in _receive_response_body
2024-09-21 07:05:44 | ERROR | stderr |     |     event = await self._receive_event(timeout=timeout)
2024-09-21 07:05:44 | ERROR | stderr |     |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_async/http11.py", line 220, in _receive_event
2024-09-21 07:05:44 | ERROR | stderr |     |     with map_exceptions({h11.RemoteProtocolError: RemoteProtocolError}):
2024-09-21 07:05:44 | ERROR | stderr |     |   File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
2024-09-21 07:05:44 | ERROR | stderr |     |     self.gen.throw(typ, value, traceback)
2024-09-21 07:05:44 | ERROR | stderr |     |   File "/usr/local/lib/python3.10/dist-packages/httpcore/_exceptions.py", line 14, in map_exceptions
2024-09-21 07:05:44 | ERROR | stderr |     |     raise to_exc(exc) from exc
2024-09-21 07:05:44 | ERROR | stderr |     | httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)
2024-09-21 07:05:44 | ERROR | stderr |     |
2024-09-21 07:05:44 | ERROR | stderr |     | The above exception was the direct cause of the following exception:
2024-09-21 07:05:44 | ERROR | stderr |     |
2024-09-21 07:05:44 | ERROR | stderr |     | Traceback (most recent call last):
2024-09-21 07:05:44 | ERROR | stderr |     |   File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 253, in wrap
2024-09-21 07:05:44 | ERROR | stderr |     |     await func()
2024-09-21 07:05:44 | ERROR | stderr |     |   File "/usr/local/lib/python3.10/dist-packages/starlette/responses.py", line 242, in stream_response
2024-09-21 07:05:44 | ERROR | stderr |     |     async for chunk in self.body_iterator:
2024-09-21 07:05:44 | ERROR | stderr |     |   File "/workspace/gpt_server/serving/openai_api_server.py", line 500, in chat_completion_stream_generator
2024-09-21 07:05:44 | ERROR | stderr |     |     async for content in generate_completion_stream(gen_params, worker_addr):
2024-09-21 07:05:44 | ERROR | stderr |     |   File "/workspace/gpt_server/serving/openai_api_server.py", line 692, in generate_completion_stream
2024-09-21 07:05:44 | ERROR | stderr |     |     async for raw_chunk in response.aiter_raw():
2024-09-21 07:05:44 | ERROR | stderr |     |   File "/usr/local/lib/python3.10/dist-packages/httpx/_models.py", line 989, in aiter_raw
2024-09-21 07:05:44 | ERROR | stderr |     |     async for raw_stream_bytes in self.stream:
2024-09-21 07:05:44 | ERROR | stderr |     |   File "/usr/local/lib/python3.10/dist-packages/httpx/_client.py", line 150, in __aiter__
2024-09-21 07:05:44 | ERROR | stderr |     |     async for chunk in self._stream:
2024-09-21 07:05:44 | ERROR | stderr |     |   File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 256, in __aiter__
2024-09-21 07:05:44 | ERROR | stderr |     |     with map_httpcore_exceptions():
2024-09-21 07:05:44 | ERROR | stderr |     |   File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
2024-09-21 07:05:44 | ERROR | stderr |     |     self.gen.throw(typ, value, traceback)
2024-09-21 07:05:44 | ERROR | stderr |     |   File "/usr/local/lib/python3.10/dist-packages/httpx/_transports/default.py", line 89, in map_httpcore_exceptions
2024-09-21 07:05:44 | ERROR | stderr |     |     raise mapped_exc(message) from exc
2024-09-21 07:05:44 | ERROR | stderr |     | httpx.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)
2024-09-21 07:05:44 | ERROR | stderr |     +------------------------------------

容器内版本信息:

root@f2ec0ab88133:/workspace# pip list
Package                           Version
--------------------------------- --------------------
accelerate                        0.34.2
addict                            2.4.0
aiofiles                          23.2.1
aiohappyeyeballs                  2.4.0
aiohttp                           3.10.5
aiosignal                         1.3.1
altair                            5.4.1
annotated-types                   0.7.0
anyio                             4.5.0
async-timeout                     4.0.3
attrs                             24.2.0
certifi                           2019.11.28
chardet                           3.0.4
charset-normalizer                3.3.2
click                             8.1.7
cloudpickle                       3.0.0
colorama                          0.4.6
coloredlogs                       15.0.1
contourpy                         1.3.0
ctranslate2                       4.4.0
cycler                            0.12.1
datasets                          3.0.0
dbus-python                       1.2.16
dill                              0.3.8
diskcache                         5.6.3
distro                            1.9.0
distro-info                       0.23+ubuntu1.1
einops                            0.8.0
evaluate                          0.4.3
exceptiongroup                    1.2.2
fastapi                           0.114.1
ffmpy                             0.3.2
filelock                          3.16.1
fire                              0.6.0
flatbuffers                       24.3.25
fonttools                         4.53.1
frozenlist                        1.4.1
fschat                            0.2.36
fsspec                            2024.6.1
gguf                              0.9.1
gradio                            4.26.0
gradio_client                     0.15.1
h11                               0.14.0
hf_transfer                       0.1.8
httpcore                          1.0.5
httptools                         0.6.1
httpx                             0.27.2
huggingface-hub                   0.25.0
humanfriendly                     10.0
idna                              2.8
importlib_metadata                8.5.0
importlib_resources               6.4.5
infinity_emb                      0.0.53
interegular                       0.3.3
Jinja2                            3.1.4
jiter                             0.5.0
joblib                            1.4.2
jsonschema                        4.23.0
jsonschema-specifications         2023.12.1
kiwisolver                        1.4.7
lark                              1.2.2
latex2mathml                      3.77.0
llvmlite                          0.43.0
lm-format-enforcer                0.10.6
lmdeploy                          0.6.0
loguru                            0.7.2
markdown-it-py                    3.0.0
markdown2                         2.5.0
MarkupSafe                        2.1.5
matplotlib                        3.9.2
mdurl                             0.1.2
mistral_common                    1.4.2
mmengine-lite                     0.10.5
mpmath                            1.3.0
msgpack                           1.1.0
msgspec                           0.18.6
multidict                         6.1.0
multiprocess                      0.70.16
narwhals                          1.8.2
nest-asyncio                      1.6.0
networkx                          3.3
nh3                               0.2.18
numba                             0.60.0
numpy                             1.26.4
nvidia-cublas-cu12                12.1.3.1
nvidia-cuda-cupti-cu12            12.1.105
nvidia-cuda-nvrtc-cu12            12.1.105
nvidia-cuda-runtime-cu12          12.1.105
nvidia-cudnn-cu12                 9.1.0.70
nvidia-cufft-cu12                 11.0.2.54
nvidia-curand-cu12                10.3.2.106
nvidia-cusolver-cu12              11.4.5.107
nvidia-cusparse-cu12              12.1.0.106
nvidia-ml-py                      12.560.30
nvidia-nccl-cu12                  2.20.5
nvidia-nvjitlink-cu12             12.6.68
nvidia-nvtx-cu12                  12.1.105
onnx                              1.16.2
onnxruntime                       1.19.2
openai                            1.44.0
opencv-python-headless            4.10.0.84
optimum                           1.22.0
orjson                            3.10.7
outlines                          0.0.46
packaging                         24.1
pandas                            2.2.3
partial-json-parser               0.2.1.1.post4
peft                              0.11.1
pillow                            10.4.0
pip                               24.2
platformdirs                      4.3.6
prometheus_client                 0.21.0
prometheus-fastapi-instrumentator 7.0.0
prompt_toolkit                    3.0.47
protobuf                          5.28.2
psutil                            6.0.0
py-cpuinfo                        9.0.0
pyairports                        2.1.1
pyarrow                           17.0.0
pycountry                         24.6.1
pydantic                          2.9.2
pydantic_core                     2.23.4
pydub                             0.25.1
Pygments                          2.18.0
PyGObject                         3.36.0
pynvml                            11.5.3
pyparsing                         3.1.4
python-apt                        2.0.1+ubuntu0.20.4.1
python-dateutil                   2.9.0.post0
python-dotenv                     1.0.1
python-multipart                  0.0.9
pytz                              2024.2
PyYAML                            6.0.2
pyzmq                             26.2.0
ray                               2.36.0
referencing                       0.35.1
regex                             2024.9.11
requests                          2.32.3
requests-unixsocket               0.2.0
rich                              13.8.1
rpds-py                           0.20.0
ruff                              0.6.6
safetensors                       0.4.5
scikit-learn                      1.5.2
scipy                             1.14.1
semantic-version                  2.10.0
sentence-transformers             3.1.1
sentencepiece                     0.2.0
setuptools                        69.5.1
shellingham                       1.5.4
shortuuid                         1.0.13
six                               1.14.0
sniffio                           1.3.1
starlette                         0.38.5
svgwrite                          1.4.3
sympy                             1.13.3
termcolor                         2.4.0
threadpoolctl                     3.5.0
tiktoken                          0.7.0
timm                              1.0.9
tokenizers                        0.19.1
tomli                             2.0.1
tomlkit                           0.12.0
torch                             2.4.0
torchvision                       0.19.0
tqdm                              4.66.5
transformers                      4.44.2
triton                            3.0.0
typer                             0.9.4
typing_extensions                 4.12.2
tzdata                            2024.1
unattended-upgrades               0.1
urllib3                           1.25.8
uvicorn                           0.23.2
uvloop                            0.20.0
vllm                              0.6.1.post2
vllm-flash-attn                   2.6.1
watchfiles                        0.24.0
wavedrom                          2.0.3.post3
wcwidth                           0.2.13
websockets                        11.0.3
wheel                             0.44.0
xformers                          0.0.27.post2
xxhash                            3.5.0
yapf                              0.40.2
yarl                              1.11.1
zipp                              3.20.2
wzs1566 commented 4 hours ago

Dockerfile内容

FROM nvidia/cuda:12.2.0-runtime-ubuntu20.04
ENV DEBIAN_FRONTEND=noninteractive
RUN mkdir /workspace
RUN mkdir /workspace/model
COPY ./ /workspace

WORKDIR /workspace

RUN sed -i 's/deb.debian.org/mirrors.ustc.edu.cn/g' /etc/apt/sources.list && \
    sed -i 's/security.debian.org/mirrors.ustc.edu.cn/g' /etc/apt/sources.list && \
    echo "开始安装python依赖环境" && apt-get update -y && apt install software-properties-common python3-dev build-essential vim git -y && add-apt-repository ppa:deadsnakes/ppa -y && \
    echo "开始安装python3.10" && apt-get install -y python3.10 curl && curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && python3.10 get-pip.py && \
    pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple && \
    ln -sf $(which python3.10) /usr/local/bin/python 

ENV PYTHONPATH=/workspace/
RUN pip install --no-cache-dir -r /workspace/requirements.txt
RUN pip install --force-reinstall lmdeploy==0.6.0 --no-deps
RUN pip cache purge

CMD ["/bin/bash"]
shell-nlp commented 3 hours ago

我正在下载 7b模型, 后面我复现一下

shell-nlp commented 3 hours ago

image 建议你先使用我整理的 docker file docker compose 进行构建

shell-nlp commented 3 hours ago

image

wzs1566 commented 41 minutes ago

你好,我使用了源码重新构建镜像,仍然是一样的错误