xusenlinzy / api-for-open-llm

Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc. 开源大模型的统一后端接口
Apache License 2.0
2.16k stars 252 forks source link

普通加载模型方式(非vLLM)推理性能明显比ChatGLM3官方的openai_api.py低 #197

Closed leoterry-ulrica closed 6 months ago

leoterry-ulrica commented 6 months ago

提交前必须检查以下项目 | The following items must be checked before submission

问题类型 | Type of problem

模型推理和部署 | Model inference and deployment

操作系统 | Operating system

Linux

详细描述问题 | Detailed description of the problem

# api-for-open-llm方式
docker-compose up -d
# ChatGLM3官方启动方式
python openai_api.py

Dependencies

# 请在此处粘贴依赖情况
# Please paste the dependencies here

运行日志或截图 | Runtime logs or screenshots

# 需求:针对同一个问题-武汉天气如何?根据这句话提取城市的名称。
temperature:0.0
top_p:0.8

api-for-open-llm运行日志(提取不了城市名称):

{'model': 'chatglm3-6b', 'frequency_penalty': 0.0, 'function_call': {'name': 'agent_extract_data'}, 'functions': [{'name': 'agent_extract_data', 'description': '你是一个天气查询助手。根据用户问题,提取出城市。注意不是简单的文本提取,而是上下文理解后的提取。如果用户问题中不包含城市则不提取。\n如果内容不存在,返回空字符串。', 'parameters': {'type': 'object', 'properties': {'city': {'type': 'string', 'description': '城市'}}, 'required': ['city']}}], 'logit_bias': None, 'max_tokens': 1024, 'n': 1, 'presence_penalty': 0.0, 'response_format': None, 'seed': None, 'stop': [], 'temperature': 0.0, 'tool_choice': None, 'tools': None, 'top_p': 0.8, 'user': None, 'stream': False,

ChatGLM3官方启动运行日志(可准确提取到城市名称)

2023-12-09 23:51:39.433 | DEBUG    | __main__:create_chat_completion:145 - ==== request ====
{'messages': [ChatMessage(role='user', content='武汉天气如何', name=None, function_call=None)], 'temperature': 0.0, 'top_p': 0.8, 'max_tokens': 1024, 'echo': False, 'stream': False, 'repetition_penalty': 1.1, 'functions': [{'name': 'agent_extract_data', 'description': '你是一个天气查询助手。根据用户问题,提取出城市。注意不是简单的文本提取,而是上下文理解后的提取。如果用户问题中不包含城市则不提取。\n如果内容不存在,返回空字符串。', 'parameters': {'type': 'object', 'properties': {'city': {'type': 'string', 'description': '城市'}}, 'required': ['city']}}]}
2023-12-09 23:51:42.977 | DEBUG    | __main__:create_chat_completion:175 - ==== message ====
role='assistant' content="agent_extract_data\n ```python\ntool_call(city='武汉')\n```" name=None function_call=FunctionCallResponse(name='agent_extract_data', arguments='{"city": "武汉"}')
xusenlinzy commented 6 months ago
from openai import OpenAI

client = OpenAI(
    api_key="EMPTY",
    base_url="http://192.168.20.59:7891/v1/",
)

params = {
    'messages': [{"role": "user", "content": "武汉天气怎么样?"}],
    'model': 'chatglm3-6b',
    'frequency_penalty': 0.0,
    'function_call': {'name': 'agent_extract_data'},
    'functions': [
        {
            'name': 'agent_extract_data',
            'description': '你是一个天气查询助手。根据用户问题,提取出城市。注意不是简单的文本提取,而是上下文理解后的提取。如果用户问题中不包含城市则不提取。\n如果内容不存在,返回空字符串。',
            'parameters': {
                'type': 'object',
                'properties': {
                    'city': {
                        'type': 'string',
                        'description': '城市'
                    }
                },
                'required': ['city']
            }
        }
    ],
    'logit_bias': None,
    'max_tokens': 1024,
    'n': 1,
    'presence_penalty': 0.0,
    'response_format': None,
    'seed': None,
    'stop': [],
    'temperature': 0.0,
    'tool_choice': None,
    'tools': None,
    'top_p': 0.8,
    'user': None,
    'stream': False
}

print(client.chat.completions.create(**params).model_dump_json(indent=4))

我这边的测试结果是正常的

{
    "id": "cmpl-7c0fa3a2-8a0e-4d3e-8c52-e82c25516f88",
    "choices": [
        {
            "finish_reason": "function_call",
            "index": 0,
            "message": {
                "content": "agent_extract_data\n ```python\ntool_call(city='武汉')\n```",
                "role": "assistant",
                "function_call": {
                    "arguments": "{\"city\": \"武汉\"}",
                    "name": "agent_extract_data"
                },
                "tool_calls": null
            }
        }
    ],
    "created": 1702173120,
    "model": "chatglm3-6b",
    "object": "chat.completion",
    "system_fingerprint": null,
    "usage": {
        "completion_tokens": 23,
        "prompt_tokens": 178,
        "total_tokens": 201
    }
}
leoterry-ulrica commented 6 months ago
from openai import OpenAI

client = OpenAI(
    api_key="EMPTY",
    base_url="http://192.168.20.59:7891/v1/",
)

params = {
    'messages': [{"role": "user", "content": "武汉天气怎么样?"}],
    'model': 'chatglm3-6b',
    'frequency_penalty': 0.0,
    'function_call': {'name': 'agent_extract_data'},
    'functions': [
        {
            'name': 'agent_extract_data',
            'description': '你是一个天气查询助手。根据用户问题,提取出城市。注意不是简单的文本提取,而是上下文理解后的提取。如果用户问题中不包含城市则不提取。\n如果内容不存在,返回空字符串。',
            'parameters': {
                'type': 'object',
                'properties': {
                    'city': {
                        'type': 'string',
                        'description': '城市'
                    }
                },
                'required': ['city']
            }
        }
    ],
    'logit_bias': None,
    'max_tokens': 1024,
    'n': 1,
    'presence_penalty': 0.0,
    'response_format': None,
    'seed': None,
    'stop': [],
    'temperature': 0.0,
    'tool_choice': None,
    'tools': None,
    'top_p': 0.8,
    'user': None,
    'stream': False
}

print(client.chat.completions.create(**params).model_dump_json(indent=4))

我这边的测试结果是正常的

{
    "id": "cmpl-7c0fa3a2-8a0e-4d3e-8c52-e82c25516f88",
    "choices": [
        {
            "finish_reason": "function_call",
            "index": 0,
            "message": {
                "content": "agent_extract_data\n ```python\ntool_call(city='武汉')\n```",
                "role": "assistant",
                "function_call": {
                    "arguments": "{\"city\": \"武汉\"}",
                    "name": "agent_extract_data"
                },
                "tool_calls": null
            }
        }
    ],
    "created": 1702173120,
    "model": "chatglm3-6b",
    "object": "chat.completion",
    "system_fingerprint": null,
    "usage": {
        "completion_tokens": 23,
        "prompt_tokens": 178,
        "total_tokens": 201
    }
}

麻烦请教一下我这种情况需要如何排查?

xusenlinzy commented 6 months ago

你看看启动命令中MODEL_NAME是不是chatglm3

leoterry-ulrica commented 6 months ago

你看看启动命令中MODEL_NAME是不是chatglm3

environment:

xusenlinzy commented 6 months ago

你能把完整的运行日志给我看一下吗,上面的不全

leoterry-ulrica commented 6 months ago

你能把完整的运行日志给我看一下吗,上面的不全

docker logs -f llm-api-server

============= == PyTorch ==

NVIDIA Release 23.10 (build 71422337) PyTorch Version 2.1.0a0+32f93b1

Container image Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Copyright (c) 2014-2023 Facebook Inc. Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert) Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu) Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu) Copyright (c) 2011-2013 NYU (Clement Farabet) Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston) Copyright (c) 2006 Idiap Research Institute (Samy Bengio) Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz) Copyright (c) 2015 Google Inc. Copyright (c) 2015 Yangqing Jia Copyright (c) 2013-2016 The Caffe contributors All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: CUDA Forward Compatibility mode ENABLED. Using CUDA 12.2 driver version 535.104.05 with kernel driver version 525.105.17. See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.

NOTE: The SHMEM allocation limit is set to the default of 64MB. This may be insufficient for PyTorch. NVIDIA recommends the use of the following flags: docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...

2023-12-10 10:40:36.164 | DEBUG | api.config::244 - SETTINGS: { "host": "0.0.0.0", "port": 8000, "api_prefix": "/v1", "engine": "default", "model_name": "chatglm3-6b", "model_path": "/workspace/checkpoints", "adapter_model_path": null, "resize_embeddings": false, "dtype": "half", "device": "cuda", "device_map": "auto", "gpus": null, "num_gpus": 1, "only_embedding": false, "embedding_name": null, "embedding_size": -1, "embedding_device": "cuda", "quantize": 16, "load_in_8bit": false, "load_in_4bit": false, "using_ptuning_v2": false, "pre_seq_len": 128, "context_length": -1, "chat_template": null, "patch_type": null, "alpha": "auto", "trust_remote_code": false, "tokenize_mode": "auto", "tensor_parallel_size": 1, "gpu_memory_utilization": 0.9, "max_num_batched_tokens": -1, "max_num_seqs": 256, "quantization_method": null, "use_streamer_v2": false, "api_keys": null, "activate_inference": true, "interrupt_requests": true, "n_gpu_layers": 0, "main_gpu": 0, "tensor_split": null, "n_batch": 512, "n_threads": 4, "n_threads_batch": 4, "rope_scaling_type": -1, "rope_freq_base": 0.0, "rope_freq_scale": 0.0 } Loading checkpoint shards: 100%|??????????| 7/7 [01:14<00:00, 10.66s/it] 2023-12-10 10:42:03.050 | INFO | api.models:create_generate_model:55 - Using default engine 2023-12-10 10:42:03.050 | INFO | api.core.default:_check_construct_prompt:124 - Using ChatGLM3 Model for Chat! INFO: Started server process [1] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) 2023-12-10 10:42:40.868 | DEBUG | api.routes.chat:create_chat_completion:47 - ==== request ==== {'model': 'chatglm3-6b', 'frequency_penalty': 0.0, 'function_call': {'name': 'agent_extract_data'}, 'functions': [{'name': 'agent_extract_data', 'description': '你是一个天气查询助手。根据用户问题,提取出城市。注意不是简单的文本提取,而是上下文理解后的提取。如果用户问题中不包含城市则不提取。\n如果内容不存在,返回空字符串。', 'parameters': {'type': 'object', 'properties': {'city': {'type': 'string', 'description': '城市'}}, 'required': ['city']}}], 'logit_bias': None, 'max_tokens': 1024, 'n': 1, 'presence_penalty': 0.0, 'response_format': None, 'seed': None, 'stop': [], 'temperature': 0.0, 'tool_choice': None, 'tools': None, 'top_p': 0.8, 'user': None, 'stream': False, 'prompt_or_messages': [{'content': '武汉今天的天气如何', 'role': 'user'}], 'echo': False, 'stop_token_ids': []} INFO: 47.115.230.229:56200 - "POST /v1/chat/completions HTTP/1.1" 200 OK

leoterry-ulrica commented 6 months ago

@xusenlinzy 你好 麻烦看一下完整日志。

xusenlinzy commented 6 months ago

对话模板没有匹配对,你把MODEL_NAME改成chatglm3或者把PROMPT_NAME改成chatglm3就可以了

leoterry-ulrica commented 6 months ago

对话模板没有匹配对,你把MODEL_NAME改成chatglm3或者把PROMPT_NAME改成chatglm3就可以了

果然是这个原因,万分感谢。请问这是为何?会跟名称绑定吗?

xusenlinzy commented 6 months ago

PROMPT_NAME没有指定的话就会根据MODEL_NAME去自动匹配对话模板,所以最好是能够把PROMPT_NAME写上

支持的模板在

https://github.com/xusenlinzy/api-for-open-llm/blob/master/api/adapter/template.py

leoterry-ulrica commented 6 months ago

@xusenlinzy 虽然对话模板匹配正确了,但时不时提取不成功(ChatGLM3的准确度蛮高):

2023-12-11 03:16:44.802 | DEBUG    | api.config:<module>:244 - SETTINGS: {
    "host": "0.0.0.0",
    "port": 8000,
    "api_prefix": "/v1",
    "engine": "default",
    "model_name": "chatglm3-6b",
    "model_path": "/workspace/checkpoints",
    "adapter_model_path": null,
    "resize_embeddings": false,
    "dtype": "half",
    "device": "cuda",
    "device_map": "auto",
    "gpus": null,
    "num_gpus": 1,
    "only_embedding": false,
    "embedding_name": null,
    "embedding_size": -1,
    "embedding_device": "cuda",
    "quantize": 16,
    "load_in_8bit": false,
    "load_in_4bit": false,
    "using_ptuning_v2": false,
    "pre_seq_len": 128,
    "context_length": -1,
    "chat_template": "chatglm3",
    "patch_type": null,
    "alpha": "auto",
    "trust_remote_code": false,
    "tokenize_mode": "auto",
    "tensor_parallel_size": 1,
    "gpu_memory_utilization": 0.9,
    "max_num_batched_tokens": -1,
    "max_num_seqs": 256,
    "quantization_method": null,
    "use_streamer_v2": false,
    "api_keys": null,
    "activate_inference": true,
    "interrupt_requests": true,
    "n_gpu_layers": 0,
    "main_gpu": 0,
    "tensor_split": null,
    "n_batch": 512,
    "n_threads": 4,
    "n_threads_batch": 4,
    "rope_scaling_type": -1,
    "rope_freq_base": 0.0,
    "rope_freq_scale": 0.0
}
2023-12-11 03:17:11.004 | DEBUG    | api.routes.chat:create_chat_completion:45 - ==== request ====
{'model': 'chatglm3-6b', 'frequency_penalty': 0.0, 'function_call': {'name': 'agent_extract_data'}, 'functions': [{'name': 'agent_extract_data', 'description': '你是一个天气查询助手。根据用户问题和上下文理解,提取出城市的名称。如果用户问题中不包含城市则不提取。\n如果内容不存在,返回空字符串。', 'parameters': {'type': 'object', 'properties': {'city': {'type': 'string', 'description': '城市'}}, 'required': ['city']}}], 'logit_bias': None, 'max_tokens': 1024, 'n': 1, 'presence_penalty': 0.0, 'response_format': None, 'seed': None, 'stop': ['<|observation|>', '</s>', '<|user|>'], 'temperature': 0.0, 'tool_choice': None, 'tools': None, 'top_p': 0.8, 'user': None, 'stream': False, 'prompt_or_messages': [{'content': '武汉天气如何', 'role': 'user'}], 'echo': False, 'stop_token_ids': [64795, 64797, 2]}
2023-12-11 03:17:11.004 | DEBUG    | api.core.default:apply_chat_template:218 - ==== Messages with tools ====
[{'role': <Role.SYSTEM: 'system'>, 'content': 'Answer the following questions as best as you can. You have access to the following tools:', 'tools': [{'name': 'agent_extract_data', 'description': '你是一个天气查询助手。根据用户问题和上下文理解,提取出城市的名称。如果用户问题中不包含城市则不提取。\n如果内容不存在,返回空字符串。', 'parameters': {'type': 'object', 'properties': {'city': {'type': 'string', 'description': '城市'}}, 'required': ['city']}}]}, {'role': 'user', 'content': '武汉天气如何'}]
INFO:     47.115.230.229:48104 - "POST /v1/chat/completions HTTP/1.1" 200 OK
xusenlinzy commented 6 months ago

你把每次返回的结果打印出来看看是什么原因

leoterry-ulrica commented 6 months ago

你把每次返回的结果打印出来看看是什么原因

问了三个城市,没有一个能提取到。

2023-12-11 04:46:55.595 | DEBUG    | api.config:<module>:244 - SETTINGS: {
    "host": "0.0.0.0",
    "port": 8000,
    "api_prefix": "/v1",
    "engine": "default",
    "model_name": "chatglm3-6b",
    "model_path": "/workspace/checkpoints",
    "adapter_model_path": null,
    "resize_embeddings": false,
    "dtype": "half",
    "device": "cuda",
    "device_map": "auto",
    "gpus": null,
    "num_gpus": 1,
    "only_embedding": false,
    "embedding_name": null,
    "embedding_size": -1,
    "embedding_device": "cuda",
    "quantize": 16,
    "load_in_8bit": false,
    "load_in_4bit": false,
    "using_ptuning_v2": false,
    "pre_seq_len": 128,
    "context_length": -1,
    "chat_template": "chatglm3",
    "patch_type": null,
    "alpha": "auto",
    "trust_remote_code": false,
    "tokenize_mode": "auto",
    "tensor_parallel_size": 1,
    "gpu_memory_utilization": 0.9,
    "max_num_batched_tokens": -1,
    "max_num_seqs": 256,
    "quantization_method": null,
    "use_streamer_v2": false,
    "api_keys": null,
    "activate_inference": true,
    "interrupt_requests": true,
    "n_gpu_layers": 0,
    "main_gpu": 0,
    "tensor_split": null,
    "n_batch": 512,
    "n_threads": 4,
    "n_threads_batch": 4,
    "rope_scaling_type": -1,
    "rope_freq_base": 0.0,
    "rope_freq_scale": 0.0
}
Loading checkpoint shards: 100%|??????????| 7/7 [01:18<00:00, 11.23s/it]
2023-12-11 04:48:30.418 | INFO     | api.models:create_generate_model:55 - Using default engine
2023-12-11 04:48:30.418 | INFO     | api.core.default:_check_construct_prompt:124 - Using ChatGLM3 Model for Chat!
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
2023-12-11 04:50:22.840 | DEBUG    | api.routes.chat:create_chat_completion:45 - ==== request ====
{'model': 'chatglm3-6b', 'frequency_penalty': 0.0, 'function_call': {'name': 'agent_extract_data'}, 'functions': [{'name': 'agent_extract_data', 'description': '你是一个天气查询助手。根据用户问题,提取出城市。注意不是简单的文本提取,而是上下文理解后的提取。如果用户问题中不包含城市则不提取。\n如果内容不存在,返回空字符串。', 'parameters': {'type': 'object', 'properties': {'city': {'type': 'string', 'description': '城市'}}, 'required': ['city']}}], 'logit_bias': None, 'max_tokens': 1024, 'n': 1, 'presence_penalty': 0.0, 'response_format': None, 'seed': None, 'stop': ['</s>', '<|observation|>', '<|user|>'], 'temperature': 0.0, 'tool_choice': None, 'tools': None, 'top_p': 0.8, 'user': None, 'stream': False, 'prompt_or_messages': [{'content': '武汉今天天气如何', 'role': 'user'}], 'echo': False, 'stop_token_ids': [64795, 64797, 2]}
2023-12-11 04:50:22.840 | DEBUG    | api.core.default:apply_chat_template:218 - ==== Messages with tools ====
[{'role': <Role.SYSTEM: 'system'>, 'content': 'Answer the following questions as best as you can. You have access to the following tools:', 'tools': [{'name': 'agent_extract_data', 'description': '你是一个天气查询助手。根据用户问题,提取出城市。注意不是简单的文本提取,而是上下文理解后的提取。如果用户问题中不包含城市则不提取。\n如果内容不存在,返回空字符串。', 'parameters': {'type': 'object', 'properties': {'city': {'type': 'string', 'description': '城市'}}, 'required': ['city']}}]}, {'role': 'user', 'content': '武汉今天天气如何'}]
INFO:     47.115.230.229:50258 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2023-12-11 04:50:42.204 | DEBUG    | api.routes.chat:create_chat_completion:45 - ==== request ====
{'model': 'chatglm3-6b', 'frequency_penalty': 0.0, 'function_call': {'name': 'agent_extract_data'}, 'functions': [{'name': 'agent_extract_data', 'description': '你是一个天气查询助手。根据用户问题,提取出城市。注意不是简单的文本提取,而是上下文理解后的提取。如果用户问题中不包含城市则不提取。\n如果内容不存在,返回空字符串。', 'parameters': {'type': 'object', 'properties': {'city': {'type': 'string', 'description': '城市'}}, 'required': ['city']}}], 'logit_bias': None, 'max_tokens': 1024, 'n': 1, 'presence_penalty': 0.0, 'response_format': None, 'seed': None, 'stop': ['</s>', '<|observation|>', '<|user|>'], 'temperature': 0.0, 'tool_choice': None, 'tools': None, 'top_p': 0.8, 'user': None, 'stream': False, 'prompt_or_messages': [{'content': '武汉今天天气如何', 'role': 'user'}, {'content': '\n对不起,无法从上下文获取到城市,请告诉我你要查询的是哪个城市的天气。', 'role': 'assistant'}, {'content': '北京天气如何', 'role': 'user'}], 'echo': False, 'stop_token_ids': [64795, 64797, 2]}
2023-12-11 04:50:42.205 | DEBUG    | api.core.default:apply_chat_template:218 - ==== Messages with tools ====
[{'role': <Role.SYSTEM: 'system'>, 'content': 'Answer the following questions as best as you can. You have access to the following tools:', 'tools': [{'name': 'agent_extract_data', 'description': '你是一个天气查询助手。根据用户问题,提取出城市。注意不是简单的文本提取,而是上下文理解后的提取。如果用户问题中不包含城市则不提取。\n如果内容不存在,返回空字符串。', 'parameters': {'type': 'object', 'properties': {'city': {'type': 'string', 'description': '城市'}}, 'required': ['city']}}]}, {'role': 'user', 'content': '武汉今天天气如何'}, {'role': 'assistant', 'metadata': '', 'content': '对不起,无法从上下文获取到城市,请告诉我你要查询的是哪个城市的天气。'}, {'role': 'user', 'content': '北京天气如何'}]
INFO:     47.115.230.229:57502 - "POST /v1/chat/completions HTTP/1.1" 200 OK
2023-12-11 04:50:47.482 | DEBUG    | api.routes.chat:create_chat_completion:45 - ==== request ====
{'model': 'chatglm3-6b', 'frequency_penalty': 0.0, 'function_call': {'name': 'agent_extract_data'}, 'functions': [{'name': 'agent_extract_data', 'description': '你是一个天气查询助手。根据用户问题,提取出城市。注意不是简单的文本提取,而是上下文理解后的提取。如果用户问题中不包含城市则不提取。\n如果内容不存在,返回空字符串。', 'parameters': {'type': 'object', 'properties': {'city': {'type': 'string', 'description': '城市'}}, 'required': ['city']}}], 'logit_bias': None, 'max_tokens': 1024, 'n': 1, 'presence_penalty': 0.0, 'response_format': None, 'seed': None, 'stop': ['</s>', '<|observation|>', '<|user|>'], 'temperature': 0.0, 'tool_choice': None, 'tools': None, 'top_p': 0.8, 'user': None, 'stream': False, 'prompt_or_messages': [{'content': '武汉今天天气如何', 'role': 'user'}, {'content': '\n对不起,无法从上下文获取到城市,请告诉我你要查询的是哪个城市的天气。', 'role': 'assistant'}, {'content': '北京天气如何', 'role': 'user'}, {'content': '\n对不起,无法从上下文获取到城市,请告诉我你要查询的是哪个城市的天气。', 'role': 'assistant'}, {'content': '广州天气如何', 'role': 'user'}], 'echo': False, 'stop_token_ids': [64795, 64797, 2]}
2023-12-11 04:50:47.482 | DEBUG    | api.core.default:apply_chat_template:218 - ==== Messages with tools ====
[{'role': <Role.SYSTEM: 'system'>, 'content': 'Answer the following questions as best as you can. You have access to the following tools:', 'tools': [{'name': 'agent_extract_data', 'description': '你是一个天气查询助手。根据用户问题,提取出城市。注意不是简单的文本提取,而是上下文理解后的提取。如果用户问题中不包含城市则不提取。\n如果内容不存在,返回空字符串。', 'parameters': {'type': 'object', 'properties': {'city': {'type': 'string', 'description': '城市'}}, 'required': ['city']}}]}, {'role': 'user', 'content': '武汉今天天气如何'}, {'role': 'assistant', 'metadata': '', 'content': '对不起,无法从上下文获取到城市,请告诉我你要查询的是哪个城市的天气。'}, {'role': 'user', 'content': '北京天气如何'}, {'role': 'assistant', 'metadata': '', 'content': '对不起,无法从上下文获取到城市,请告诉我你要查询的是哪个城市的天气。'}, {'role': 'user', 'content': '广州天气如何'}]
INFO:     47.115.230.229:57502 - "POST /v1/chat/completions HTTP/1.1" 200 OK
fenglinbei commented 6 months ago

可能是模型的问题,chatglm3的模型毛病挺多

leoterry-ulrica commented 6 months ago

可能是模型的问题,chatglm3的模型毛病挺多

但chatglm3官方的openai_api.py启动就没有这个问题。

leoterry-ulrica commented 6 months ago

@xusenlinzy 打扰一下 再帮忙看一下呢。工具信息也传递过来了。

llm-api-server  | 2023-12-15 10:42:23.658 | DEBUG    | api.core.default:apply_chat_template:218 - ==== Messages with tools ====
llm-api-server  | [{'role': <Role.SYSTEM: 'system'>, 'content': 'Answer the following questions as best as you can. You have access to the following tools:', 'tools': [{'name': 'extract_json_data', 'description': '你是一个天气查询助手。根据用户问题,提取出城市。注意不是简单的文本提取,而是上下文理解后的提取。如果用户问题中不包含城市则不提取。\\n如果内容不存在,返回空字符串。', 'parameters': {'type': 'object', 'properties': {'city': {'type': 'string', 'description': '城市'}}, 'required': ['city']}}]}, {'role': 'user', 'content': '<任务描述>\n你是一个天气查询助手。根据用户问题,提取出城市。注意不是简单的文本提取,而是上下文理解后的提取。如果用户问题中不包含城市则不提取。\\n如果内容不存在,返回空字符串。\n\n- 如果字段为空,你返回空字符串。\n- 不要换行。\n- 结合历史记录和文本进行提取。\n</任务描述>\n\n<文本>\n今天武汉天气如何\n</文本>'}]
llm-api-server  | INFO:     172.20.0.1:49912 - "POST /v1/chat/completions HTTP/1.1" 200 OK
llm-api-server  | 2023-12-15 10:42:24.162 | DEBUG    | api.routes.chat:create_chat_completion:45 - ==== request ====
llm-api-server  | {'model': 'chatglm3-6b', 'frequency_penalty': 0.0, 'function_call': None, 'functions': None, 'logit_bias': None, 'max_tokens': 2000, 'n': 1, 'presence_penalty': 0.0, 'response_format': None, 'seed': None, 'stop': ['<|user|>', '<|observation|>', '</s>'], 'temperature': 0.24, 'tool_choice': None, 'tools': None, 'top_p': 0.8, 'user': None, 'stream': True, 'prompt_or_messages': [{'content': '今天武汉天气如何', 'role': 'user'}], 'echo': False, 'stop_token_ids': [64795, 64797, 2]}
llm-api-server  | INFO:     172.20.0.1:49912 - "POST /v1/chat/completions HTTP/1.1" 200 OK
leoterry-ulrica commented 6 months ago
from openai import OpenAI

client = OpenAI(
    api_key="EMPTY",
    base_url="http://192.168.20.59:7891/v1/",
)

params = {
    'messages': [{"role": "user", "content": "武汉天气怎么样?"}],
    'model': 'chatglm3-6b',
    'frequency_penalty': 0.0,
    'function_call': {'name': 'agent_extract_data'},
    'functions': [
        {
            'name': 'agent_extract_data',
            'description': '你是一个天气查询助手。根据用户问题,提取出城市。注意不是简单的文本提取,而是上下文理解后的提取。如果用户问题中不包含城市则不提取。\n如果内容不存在,返回空字符串。',
            'parameters': {
                'type': 'object',
                'properties': {
                    'city': {
                        'type': 'string',
                        'description': '城市'
                    }
                },
                'required': ['city']
            }
        }
    ],
    'logit_bias': None,
    'max_tokens': 1024,
    'n': 1,
    'presence_penalty': 0.0,
    'response_format': None,
    'seed': None,
    'stop': [],
    'temperature': 0.0,
    'tool_choice': None,
    'tools': None,
    'top_p': 0.8,
    'user': None,
    'stream': False
}

print(client.chat.completions.create(**params).model_dump_json(indent=4))

我这边的测试结果是正常的

{
    "id": "cmpl-7c0fa3a2-8a0e-4d3e-8c52-e82c25516f88",
    "choices": [
        {
            "finish_reason": "function_call",
            "index": 0,
            "message": {
                "content": "agent_extract_data\n ```python\ntool_call(city='武汉')\n```",
                "role": "assistant",
                "function_call": {
                    "arguments": "{\"city\": \"武汉\"}",
                    "name": "agent_extract_data"
                },
                "tool_calls": null
            }
        }
    ],
    "created": 1702173120,
    "model": "chatglm3-6b",
    "object": "chat.completion",
    "system_fingerprint": null,
    "usage": {
        "completion_tokens": 23,
        "prompt_tokens": 178,
        "total_tokens": 201
    }
}

麻烦请教一下我这种情况需要如何排查?

用了你相同代码逻辑,打印出以下内容:

{
    "id": "cmpl-055c6d94-3d1c-4103-a0a8-dc32bf8e37d4",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "武汉的天气我无法获取,建议您查询相关新闻或咨询当地气象局获得最新信息。",
                "role": "assistant",
                "function_call": null,
                "tool_calls": null
            }
        }
    ],
    "created": 1702639571,
    "model": "chatglm3-6b",
    "object": "chat.completion",
    "system_fingerprint": null,
    "usage": {
        "completion_tokens": 23,
        "prompt_tokens": 43,
        "total_tokens": 66
    }
}