部署推理llama报错 KeyError: 'properties'

Reminder

[X] I have read the README and searched the existing issues.

System Info

llamafactory version: 0.8.3.dev0
Platform: Linux-5.10.0-1.0.0.26-x86_64-with-glibc2.17
Python version: 3.10.14
PyTorch version: 2.3.0+cu121 (GPU)
Transformers version: 4.41.2
Datasets version: 2.20.0
Accelerate version: 0.31.0
PEFT version: 0.11.1
TRL version: 0.9.4
GPU type: NVIDIA A800-SXM4-80GB
vLLM version: 0.4.3

Reproduction

CUDA_VISIBLE_DEVICES=0 API_PORT=8244 llamafactory-cli api examples/inference/llama3.yaml

CUDA_VISIBLE_DEVICES=0 API_PORT=8244 llamafactory-cli api examples/inference/llama3_vllm.yaml

Expected behavior

[INFO|tokenization_utils_base.py:2106] 2024-06-20 22:48:38,089 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:2106] 2024-06-20 22:48:38,089 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2106] 2024-06-20 22:48:38,089 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2106] 2024-06-20 22:48:38,089 >> loading file tokenizer_config.json [WARNING|logging.py:314] 2024-06-20 22:48:39,221 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 06/20/2024 22:48:39 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|> 06/20/2024 22:48:39 - INFO - llamafactory.data.template - Add pad token: <|eot_id|> [INFO|configuration_utils.py:731] 2024-06-20 22:48:39,435 >> loading configuration file /home/work/wenku_yq/DataVault/models/Meta-Llama-3-8B-Instruct/config.json [INFO|configuration_utils.py:796] 2024-06-20 22:48:39,436 >> Model config LlamaConfig { "_name_or_path": "/home/work/wenku_yq/DataVault/models/Meta-Llama-3-8B-Instruct/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128001, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "vocab_size": 128256 }

06/20/2024 22:48:39 - INFO - llamafactory.model.patcher - Using KV cache for faster generation. [INFO|modeling_utils.py:3471] 2024-06-20 22:48:39,519 >> loading weights file /home/work/wenku_yq/DataVault/models/Meta-Llama-3-8B-Instruct/model.safetensors.index.json [INFO|modeling_utils.py:1519] 2024-06-20 22:48:39,527 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:962] 2024-06-20 22:48:39,529 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128001 }

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [03:04<00:00, 46.20s/it] [INFO|modeling_utils.py:4280] 2024-06-20 22:51:46,632 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4288] 2024-06-20 22:51:46,632 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /home/work/wenku_yq/DataVault/models/Meta-Llama-3-8B-Instruct/. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|configuration_utils.py:915] 2024-06-20 22:51:46,654 >> loading configuration file /home/work/wenku_yq/DataVault/models/Meta-Llama-3-8B-Instruct/generation_config.json [INFO|configuration_utils.py:962] 2024-06-20 22:51:46,655 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128001 }

06/20/2024 22:51:46 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 06/20/2024 22:51:46 - INFO - llamafactory.model.loader - all params: 8030261248 Visit http://localhost:8244/docs for API document. INFO: Started server process [74545] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8244 (Press CTRL+C to quit) INFO: 127.0.0.1:29184 - "GET /docs HTTP/1.1" 200 OK INFO: 127.0.0.1:29184 - "GET /openapi.json HTTP/1.1" 200 OK INFO: 127.0.0.1:25164 - "GET /v1/models HTTP/1.1" 200 OK INFO: 127.0.0.1:25164 - "GET /v1/models HTTP/1.1" 200 OK 06/20/2024 22:52:42 - INFO - llamafactory.api.chat - ==== request ==== { "model": "string", "messages": [ { "role": "user", "content": "string", "tool_calls": [ { "id": "string", "type": "function", "function": { "name": "string", "arguments": "string" } } ] } ], "tools": [ { "type": "function", "function": { "name": "string", "description": "string", "parameters": {} } } ], "do_sample": true, "temperature": 0.0, "top_p": 0.0, "n": 1, "max_tokens": 0, "stop": "string", "stream": false } INFO: 127.0.0.1:33696 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi result = await app( # type: ignore[func-returns-value] File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in call return await self.app(scope, receive, send) File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/starlette/middleware/cors.py", line 93, in call await self.simple_response(scope, receive, send, request_headers=headers) File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/starlette/middleware/cors.py", line 148, in simple_response await self.app(scope, receive, send) File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/starlette/routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/starlette/routing.py", line 72, in app response = await func(request) File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(values) File "/home/users/liuyi33/temp/workshop/LLaMA-Factory/src/llamafactory/api/app.py", line 99, in create_chat_completion return await create_chat_completion_response(request, chat_model) File "/home/users/liuyi33/temp/workshop/LLaMA-Factory/src/llamafactory/api/chat.py", line 148, in create_chat_completion_response responses = await chat_model.achat( File "/home/users/liuyi33/temp/workshop/LLaMA-Factory/src/llamafactory/chat/chat_model.py", line 72, in achat return await self.engine.chat(messages, system, tools, image, input_kwargs) File "/home/users/liuyi33/temp/workshop/LLaMA-Factory/src/llamafactory/chat/hf_engine.py", line 296, in chat return await loop.run_in_executor(pool, self._chat, input_args) File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(self.args, *self.kwargs) File "/home/users/liuyi33/.conda/envs/llama_factory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "/home/users/liuyi33/temp/workshop/LLaMA-Factory/src/llamafactory/chat/hf_engine.py", line 186, in _chat gen_kwargs, prompt_length = HuggingfaceEngine._process_args( File "/home/users/liuyi33/temp/workshop/LLaMA-Factory/src/llamafactory/chat/hf_engine.py", line 95, in _process_args promptids, = template.encode_oneturn( File "/home/users/liuyi33/temp/workshop/LLaMA-Factory/src/llamafactory/data/template.py", line 60, in encode_oneturn encoded_pairs = self._encode(tokenizer, messages, system, tools, cutoff_len, reserved_label_len) File "/home/users/liuyi33/temp/workshop/LLaMA-Factory/src/llamafactory/data/template.py", line 111, in _encode tool_text = self.format_tools.apply(content=tools)[0] if tools else "" File "/home/users/liuyi33/temp/workshop/LLaMA-Factory/src/llamafactory/data/formatter.py", line 225, in apply return [self._tool_formatter(tools) if len(tools) != 0 else ""] File "/home/users/liuyi33/temp/workshop/LLaMA-Factory/src/llamafactory/data/formatter.py", line 47, in default_tool_formatter for name, param in tool["parameters"]["properties"].items(): KeyError: 'properties'

一用就报错了，vllm和普通推理都会报错。

Others

No response

hiyouga / LLaMA-Factory