docker容器运行Qwen2.5-7b报错:TextEncodeInput must be Union #15

我在构建docker容器运行Qwen2.5-7b时,遇到一些错误。 错误信息如下:

== CUDA ==

CUDA Version 12.2.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

2024-09-21 07:05:37.625 | DEBUG    | gpt_server.utils:delete_log:177 - logs_path: /workspace/logs
2024-09-21 07:05:37.654 | INFO     | gpt_server.utils:run_cmd:14 - 执行命令如下:
python -m fastchat.serve.controller --host --port 21001 --dispatch-method shortest_queue 

2024-09-21 07:05:37.656 | INFO     | gpt_server.utils:run_cmd:14 - 执行命令如下:
python -m gpt_server.serving.openai_api_server --host --port 8082 --controller-address http://localhost:21001

2024-09-21 07:05:37.671 | INFO     | gpt_server.utils:run_cmd:14 - 执行命令如下:
CUDA_VISIBLE_DEVICES=0 python -m gpt_server.model_worker.qwen --num_gpus 1 --model_name_or_path /workspace/model --model_names qwen25-7b,qwen2.5-7b --backend lmdeploy-turbomind --host --controller_address http://localhost:21001

2024-09-21 07:05:37 | INFO | controller | args: Namespace(host='', port=21001, dispatch_method='shortest_queue', ssl=False)
2024-09-21 07:05:37 | ERROR | stderr | INFO:     Started server process [37]
2024-09-21 07:05:37 | ERROR | stderr | INFO:     Waiting for application startup.
2024-09-21 07:05:37 | ERROR | stderr | INFO:     Application startup complete.
2024-09-21 07:05:38 | ERROR | stderr | INFO:     Uvicorn running on (Press CTRL+C to quit)
2024-09-21 07:05:38 | INFO | openai_api_server | args: Namespace(host='', port=8082, controller_address='http://localhost:21001', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_keys=None, ssl=False)
2024-09-21 07:05:38 | ERROR | stderr | INFO:     Started server process [41]
2024-09-21 07:05:38 | ERROR | stderr | INFO:     Waiting for application startup.
2024-09-21 07:05:38 | ERROR | stderr | INFO:     Application startup complete.
2024-09-21 07:05:38 | ERROR | stderr | INFO:     Uvicorn running on (Press CTRL+C to quit)
INFO:     Started server process [44]
INFO:     Waiting for application startup.
2024-09-21 07:05:39.411 | INFO     | gpt_server.model_worker.base.model_worker_base:load_model_tokenizer:110 - QwenWorker 使用 LMDeploy 后端
2024-09-21 07:05:39.411 | INFO     | gpt_server.model_backend.lmdeploy_backend:__init__:30 - 后端 turbomind
2024-09-21 07:05:39.413 | INFO     | gpt_server.model_backend.lmdeploy_backend:__init__:36 - 模型架构:llm
2024-09-21 07:05:39,413 - lmdeploy - INFO - input backend=turbomind, backend_config=TurbomindEngineConfig(model_format=None, tp=1, session_len=None, max_batch_size=None, cache_max_entry_count=0.8, cache_chunk_size=-1, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=0, max_prefill_iters=1)
2024-09-21 07:05:39,413 - lmdeploy - INFO - input chat_template_config=None
2024-09-21 07:05:39,414 - lmdeploy - WARNING - Did not find a chat template matching /workspace/model.
2024-09-21 07:05:39,418 - lmdeploy - INFO - updated chat_template_onfig=ChatTemplateConfig(model_name='base', system=None, meta_instruction=None, eosys=None, user=None, eoh=None, assistant=None, eoa=None, separator=None, capability=None, stop_words=None)
2024-09-21 07:05:39,428 - lmdeploy - INFO - model_source: hf_model
2024-09-21 07:05:39,663 - lmdeploy - INFO - turbomind model config:

  "model_config": {
    "model_name": "",
    "chat_template": "",
    "model_arch": "Qwen2ForCausalLM",
    "head_num": 28,
    "kv_head_num": 4,
    "hidden_units": 3584,
    "vocab_size": 152064,
    "num_layer": 28,
    "inter_size": 18944,
    "norm_eps": 1e-06,
    "attn_bias": 1,
    "start_id": 151643,
    "end_id": 151645,
    "size_per_head": 128,
    "group_size": 128,
    "weight_type": "bf16",
    "session_len": 32768,
    "tp": 1,
    "model_format": "hf"
  "attention_config": {
    "rotary_embedding": 128,
    "rope_theta": 1000000.0,
    "max_position_embeddings": 32768,
    "original_max_position_embeddings": 0,
    "rope_scaling_type": "",
    "rope_scaling_factor": 0.0,
    "use_dynamic_ntk": 0,
    "low_freq_factor": 1.0,
    "high_freq_factor": 1.0,
    "use_logn_attn": 0,
    "cache_block_seq_len": 64
  "lora_config": {
    "lora_policy": "",
    "lora_r": 0,
    "lora_scale": 0.0,
    "lora_max_wo_r": 0,
    "lora_rank_pattern": "",
    "lora_scale_pattern": ""
  "engine_config": {
    "model_format": null,
    "tp": 1,
    "session_len": null,
    "max_batch_size": 128,
    "cache_max_entry_count": 0.8,
    "cache_chunk_size": -1,
    "cache_block_seq_len": 64,
    "enable_prefix_caching": false,
    "quant_policy": 0,
    "rope_scaling_factor": 0.0,
    "use_logn_attn": false,
    "download_dir": null,
    "revision": null,
    "max_prefill_token_num": 8192,
    "num_tokens_per_iter": 8192,
    "max_prefill_iters": 4
[TM][WARNING] [LlamaTritonModel] `max_context_token_num` is not set, default to 32768.
[TM][INFO] Model: 
head_num: 28
kv_head_num: 4
size_per_head: 128
inter_size: 18944
num_layer: 28
vocab_size: 152064
attn_bias: 1
max_batch_size: 128
max_prefill_token_num: 8192
max_context_token_num: 32768
num_tokens_per_iter: 8192
max_prefill_iters: 4
session_len: 32768
cache_max_entry_count: 0.8
cache_block_seq_len: 64
cache_chunk_size: -1
enable_prefix_caching: 0
start_id: 151643
tensor_para_size: 1
pipeline_para_size: 1
enable_custom_all_reduce: 0
quant_policy: 0
group_size: 128

2024-09-21 07:05:39,674 - lmdeploy - WARNING - get 255 model params
[TM][INFO] [LlamaWeight<T>::prepare] workspace size: 0                                                    

[WARNING] is not found; using default GEMM algo
[TM][INFO] [BlockManager] block_size = 3 MB
[TM][INFO] [BlockManager] max_block_count = 2059
[TM][INFO] [BlockManager] chunk_size = 2059
[TM][INFO] LlamaBatch<T>::Start()
2024-09-21 07:05:41,701 - lmdeploy - INFO - updated backend_config=TurbomindEngineConfig(model_format=None, tp=1, session_len=None, max_batch_size=128, cache_max_entry_count=0.8, cache_chunk_size=-1, cache_block_seq_len=64, enable_prefix_caching=False, quant_policy=0, rope_scaling_factor=0.0, use_logn_attn=False, download_dir=None, revision=None, max_prefill_token_num=8192, num_tokens_per_iter=8192, max_prefill_iters=4)
2024-09-21 07:05:41.703 | INFO     | gpt_server.model_worker.base.model_worker_base:load_model_tokenizer:128 - load_model_tokenizer 完成
2024-09-21 07:05:41.704 | INFO     | gpt_server.model_worker.base.model_worker_base:get_context_length:74 - 模型配置:
2024-09-21 07:05:41.704 | INFO     | gpt_server.model_worker.base.model_worker_base:__init__:58 - Loading the model ['qwen25-7b', 'qwen2.5-7b'] on worker 877ee7c4 ...
2024-09-21 07:05:41.704 | INFO     | gpt_server.model_worker.base.base_model_worker:register_to_controller:93 - Register to controller
2024-09-21 07:05:41 | INFO | controller | Register a new worker:
2024-09-21 07:05:41 | INFO | controller | Register done:, {'model_names': ['qwen25-7b', 'qwen2.5-7b'], 'speed': 1, 'queue_length': 0}
2024-09-21 07:05:41 | INFO | stdout | INFO: - "POST /register_worker HTTP/1.1" 200 OK
2024-09-21 07:05:41.707 | INFO     | gpt_server.model_worker.base.model_worker_base:__init__:63 - worker 已赋值
2024-09-21 07:05:41.707 | INFO     | __main__:__init__:45 - qwen停用词: ['<|endoftext|>', '<|im_start|>', '<|im_end|>', 'Observation:']
INFO:     Application startup complete.
INFO:     Uvicorn running on (Press CTRL+C to quit)
2024-09-21 07:05:44 | INFO | stdout | INFO: - "POST /list_models HTTP/1.1" 200 OK
2024-09-21 07:05:44 | INFO | controller | names: [''], queue_lens: [0.0], ret:
2024-09-21 07:05:44 | INFO | stdout | INFO: - "POST /get_worker_address HTTP/1.1" 200 OK
INFO: - "POST /model_details HTTP/1.1" 200 OK
INFO: - "POST /count_token HTTP/1.1" 200 OK
2024-09-21 07:05:44 | INFO | stdout | INFO: - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO: - "POST /worker_generate_stream HTTP/1.1" 200 OK
2024-09-21 07:05:44.826 | INFO     | __main__:generate_stream_gate:53 - params {'model': 'qwen25-7b', 'temperature': 0.7, 'logprobs': None, 'top_p': 1.0, 'top_k': -1, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'max_new_tokens': 1048576, 'echo': False, 'stop': [], 'messages': [{'role': 'user', 'content': 'Make a scatter plot with x_values 1, 2 and y_values 3, 4'}], 'tools': None, 'tool_choice': None, 'request_id': '1', 'request': <starlette.requests.Request object at 0x7fd3af5331f0>}
2024-09-21 07:05:44.826 | INFO     | __main__:generate_stream_gate:54 - worker_id: 877ee7c4
2024-09-21 07:05:44.827 | INFO     | __main__:generate_stream_gate:74 - 正在使用qwen-2.0 !
2024-09-21 07:05:44.849 | INFO     | gpt_server.model_backend.lmdeploy_backend:stream_chat:45 - <|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
Make a scatter plot with x_values 1, 2 and y_values 3, 4<|im_end|>

2024-09-21 07:05:44.849 | INFO     | gpt_server.model_backend.lmdeploy_backend:stream_chat:79 - request_id 1
2024-09-21 07:05:44,974 - lmdeploy - WARNING - The token Observation:, its length of indexes [37763, 367, 25] is over than 1. Currently, it can not be used as stop words
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/starlette/", line 257, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/usr/local/lib/python3.10/dist-packages/starlette/", line 253, in wrap
    await func()
  File "/usr/local/lib/python3.10/dist-packages/starlette/", line 230, in listen_for_disconnect
    message = await receive()
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/", line 587, in receive
    await self.message_event.wait()
  File "/usr/lib/python3.10/asyncio/", line 214, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7fd3af5338b0

During handling of the above exception, another exception occurred:

  + Exception Group Traceback (most recent call last):
  |   File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/", line 426, in run_asgi
  |     result = await app(  # type: ignore[func-returns-value]
  |   File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/", line 84, in __call__
  |     return await, receive, send)
  |   File "/usr/local/lib/python3.10/dist-packages/fastapi/", line 1054, in __call__
  |     await super().__call__(scope, receive, send)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/", line 113, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/", line 187, in __call__
  |     raise exc
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/", line 165, in __call__
  |     await, receive, _send)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/", line 62, in __call__
  |     await wrap_app_handling_exceptions(, conn)(scope, receive, send)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/", line 62, in wrapped_app
  |     raise exc
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/", line 51, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/", line 715, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/", line 735, in app
  |     await route.handle(scope, receive, send)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/", line 288, in handle
  |     await, receive, send)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/", line 76, in app
  |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/", line 62, in wrapped_app
  |     raise exc
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/", line 51, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/", line 74, in app
  |     await response(scope, receive, send)
  |   File "/usr/local/lib/python3.10/dist-packages/starlette/", line 250, in __call__
  |     async with anyio.create_task_group() as task_group:
  |   File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/", line 685, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/", line 253, in wrap
    |     await func()
    |   File "/usr/local/lib/python3.10/dist-packages/starlette/", line 242, in stream_response
    |     async for chunk in self.body_iterator:
    |   File "/workspace/gpt_server/model_worker/", line 91, in generate_stream_gate
    |     async for ret in self.backend.stream_chat(params=params):
    |   File "/workspace/gpt_server/model_backend/", line 84, in stream_chat
    |     async for request_output in results_generator:
    |   File "/usr/local/lib/python3.10/dist-packages/lmdeploy/serve/", line 509, in generate
    |     prompt_input = await self._get_prompt_input(prompt,
    |   File "/usr/local/lib/python3.10/dist-packages/lmdeploy/serve/", line 453, in _get_prompt_input
    |     input_ids = self.tokenizer.encode(prompt, add_bos=sequence_start)
    |   File "/usr/local/lib/python3.10/dist-packages/lmdeploy/", line 600, in encode
    |     return self.model.encode(s, add_bos, add_special_tokens, **kwargs)
    |   File "/usr/local/lib/python3.10/dist-packages/lmdeploy/", line 366, in encode
    |     encoded = self.model.encode(s,
    |   File "/usr/local/lib/python3.10/dist-packages/transformers/", line 2825, in encode
    |     encoded_inputs = self.encode_plus(
    |   File "/usr/local/lib/python3.10/dist-packages/transformers/", line 3237, in encode_plus
    |     return self._encode_plus(
    |   File "/usr/local/lib/python3.10/dist-packages/transformers/", line 601, in _encode_plus
    |     batched_output = self._batch_encode_plus(
    |   File "/usr/local/lib/python3.10/dist-packages/transformers/", line 528, in _batch_encode_plus
    |     encodings = self._tokenizer.encode_batch(
    | TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]
FROM nvidia/cuda:12.2.0-runtime-ubuntu20.04
ENV DEBIAN_FRONTEND=noninteractive
RUN mkdir /workspace
RUN mkdir /workspace/model
COPY ./ /workspace

WORKDIR /workspace

RUN sed -i 's/' /etc/apt/sources.list && \
    sed -i 's/' /etc/apt/sources.list && \
    echo "开始安装python依赖环境" && apt-get update -y && apt install software-properties-common python3-dev build-essential vim git -y && add-apt-repository ppa:deadsnakes/ppa -y && \
    echo "开始安装python3.10" && apt-get install -y python3.10 curl && curl -o && python3.10 && \
    pip config set global.index-url && \
    ln -sf $(which python3.10) /usr/local/bin/python 

ENV PYTHONPATH=/workspace/
RUN pip install --no-cache-dir -r /workspace/requirements.txt
RUN pip install --force-reinstall lmdeploy==0.6.0 --no-deps
RUN pip cache purge

CMD ["/bin/bash"]
我正在下载 7b模型, 后面我复现一下

image 建议你先使用我整理的 docker file docker compose 进行构建

