qwen2推理报错 - Githubissues

wj1017090777 commented 1 week ago

提交前必须检查以下项目 | The following items must be checked before submission

[X] 请确保使用的是仓库最新代码（git pull），一些问题已被解决和修复。 | Make sure you are using the latest code from the repository (git pull), some issues have already been addressed and fixed.
[X] 我已阅读项目文档和FAQ章节并且已在Issue中对问题进行了搜索，没有找到相似问题和解决方案 | I have searched the existing issues / discussions

问题类型 | Type of problem

模型推理和部署 | Model inference and deployment

操作系统 | Operating system

Linux

详细描述问题 | Detailed description of the problem

Dependencies

# 请在此处粘贴依赖情况
# Please paste the dependencies here

运行日志或截图 | Runtime logs or screenshots

{'model': 'gpt-3.5-turbo', 'frequency_penalty': 0.0, 'function_call': None, 'functions': None, 'logit_bias': None, 'logprobs': False, 'max_tokens': 5, 'n': 1, 'presence_penalty': 0.0, 'response_format': None, 'seed': None, 'stop': ['<|endoftext|>', '<|im_end|>'], 'temperature': 0.9, 'tool_choice': None, 'tools': None, 'top_logprobs': None, 'top_p': 1.0, 'user': None, 'stream': False, 'repetition_penalty': 1.03, 'typical_p': None, 'watermark': False, 'best_of': 1, 'ignore_eos': False, 'use_beam_search': False, 'stop_token_ids': [], 'skip_special_tokens': True, 'spaces_between_special_tokens': True, 'min_p': 0.0, 'include_stop_str_in_output': False, 'length_penalty': 1.0, 'guided_json': None, 'guided_regex': None, 'guided_choice': None, 'guided_grammar': None, 'guided_decoding_backend': None, 'prompt_or_messages': [{'role': 'user', 'content': 'ping'}], 'echo': False}
INFO:     172.20.0.1:55578 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/workspace/api/templates/base.py", line 47, in convert_messages_to_ids
    token_ids = self._convert_messages_to_ids(
  File "/workspace/api/templates/base.py", line 81, in _convert_messages_to_ids
    raise NotImplementedError
NotImplementedError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app
    response = await func(request)
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
  File "/workspace/api/vllm_routes/chat.py", line 77, in create_chat_completion
    token_ids = engine.template.convert_messages_to_ids(
  File "/workspace/api/templates/base.py", line 56, in convert_messages_to_ids
    token_ids = self.apply_chat_template(
  File "/workspace/api/templates/base.py", line 90, in apply_chat_template
    return self.tokenizer.apply_chat_template(
  File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 1837, in apply_chat_template
    raise ValueError(
ValueError: Tools should either be a JSON schema, or a callable function with type hints and a docstring suitable for auto-conversion to a schema.

直接docker-compose 部署qwen2后用首页的测试openai的代码测试接口报以上错，tools为none 不知为何会报错

xusenlinzy commented 1 week ago

应该是transformers版本更新了，你更新一下项目代码试试

wuxiulike commented 1 week ago

我本机部署的，升级了一下新版，出现了同样的问题

wuxiulike commented 1 week ago

好像不是qwen2，qwen1.5也一样，git pull前都挺好的，更新后就这样了

xusenlinzy commented 1 week ago

是transformers最新版本添加了tools参数所以原来的位置参数有点问题

xusenlinzy commented 1 week ago

好像不是qwen2，qwen1.5也一样，git pull前都挺好的，更新后就这样了

我测试是没问题的

wj1017090777 commented 1 week ago

好像不是qwen2，qwen1.5也一样，git pull前都挺好的，更新后就这样了

我测试是没问题的

拉去最新代码后解决了

wj1017090777 commented 6 days ago

好像不是qwen2，qwen1.5也一样，git pull前都挺好的，更新后就这样了

我测试是没问题的

2024-07-04 08:28:47,272 WARNING services.py:2009 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67084288 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=10.24gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM. 2024-07-04 08:28:48,452 INFO worker.py:1771 -- Started a local Ray instance. INFO 07-04 08:28:50 config.py:623] Defaulting to use mp for distributed inference INFO 07-04 08:28:50 llm_engine.py:161] Initializing an LLM engine (v0.5.0.post1) with config: model='checkpoints/glm-4-9b-chat', speculative_config=None, tokenizer='checkpoints/glm-4-9b-chat', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=checkpoints/glm-4-9b-chat) Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. WARNING 07-04 08:28:51 tokenizer.py:126] Using a slow tokenizer. This might cause a significant slowdown. Consider using a fast tokenizer instead. Traceback (most recent call last): File "", line 1, in File "/usr/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "/usr/lib/python3.10/multiprocessing/spawn.py", line 125, in _main prepare(preparation_data) File "/usr/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "/usr/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "/usr/lib/python3.10/runpy.py", line 289, in run_path return _run_module_code(code, init_globals, run_name, File "/usr/lib/python3.10/runpy.py", line 96, in _run_module_code _run_code(code, mod_globals, init_globals, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/workspace/api/server.py", line 2, in from api.models import ( File "/workspace/api/models.py", line 153, in LLM_ENGINE = create_vllm_engine() File "/workspace/api/models.py", line 131, in create_vllm_engine engine = AsyncLLMEngine.from_engine_args(engine_args) File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 398, in from_engine_args engine = cls( File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 349, in init self.engine = self._init_engine(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm/engine/async_llm_engine.py", line 473, in _init_engine return engine_class(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 223, in init self.model_executor = executor_class( File "/usr/local/lib/python3.10/dist-packages/vllm/executor/multiproc_gpu_executor.py", line 142, in init super().init(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm/executor/distributed_gpu_executor.py", line 25, in init super().init(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm/executor/executor_base.py", line 41, in init self._init_executor() File "/usr/local/lib/python3.10/dist-packages/vllm/executor/multiproc_gpu_executor.py", line 48, in _init_executor self.workers = [ File "/usr/local/lib/python3.10/dist-packages/vllm/executor/multiproc_gpu_executor.py", line 49, in ProcessWorkerWrapper( File "/usr/local/lib/python3.10/dist-packages/vllm/executor/multiproc_worker_utils.py", line 162, in init self.process.start() File "/usr/lib/python3.10/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/usr/lib/python3.10/multiprocessing/context.py", line 288, in _Popen return Popen(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in init super().init(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch prep_data = spawn.get_preparation_data(process_obj._name) File "/usr/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data _check_not_importing_main() File "/usr/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main raise RuntimeError(''' RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

ERROR 07-04 08:28:55 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 5653 died, exit code: 1 INFO 07-04 08:28:55 multiproc_worker_utils.py:123] Killing local vLLM worker processes

我在跑glm-4的时候遇到这个问题能帮忙看看吗

xusenlinzy commented 6 days ago

加上这两个环境变量试试

wj1017090777 commented 6 days ago

加上这两个环境变量试试

感谢回复目前跑起来了不过有问题，报错如下： 2024-07-05 01:37:41.905 | DEBUG | api.vllm_routes.chat:create_chat_completion:74 - ==== request ==== {'model': 'gpt-3.5-turbo', 'frequency_penalty': 0.0, 'function_call': None, 'functions': None, 'logit_bias': None, 'logprobs': False, 'max_tokens': 512, 'n': 1, 'presence_penalty': 0.0, 'response_format': None, 'seed': None, 'stop': ['<|observation|>', '<|endoftext|>', ''], 'temperature': 0.9, 'tool_choice': None, 'tools': None, 'top_logprobs': None, 'top_p': 1.0, 'user': None, 'stream': False, 'repetition_penalty': 1.03, 'typical_p': None, 'watermark': False, 'best_of': 1, 'ignore_eos': False, 'use_beam_search': False, 'stop_token_ids': [151336, 151329, 151338], 'skip_special_tokens': True, 'spaces_between_special_tokens': True, 'min_p': 0.0, 'include_stop_str_in_output': False, 'length_penalty': 1.0, 'guided_json': None, 'guided_regex': None, 'guided_choice': None, 'guided_grammar': None, 'guided_decoding_backend': None, 'prompt_or_messages': [{'role': 'user', 'content': '你好'}], 'echo': False} INFO: 192.168.1.10:45704 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 399, in run_asgi result = await app( # type: ignore[func-returns-value] File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 70, in call return await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in call await super().call(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in call raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in call await self.app(scope, receive, _send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 85, in call await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 64, in wrapped_app raise exc File "/usr/local/lib/python3.10/dist-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 72, in app response = await func(request) File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/usr/local/lib/python3.10/dist-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(**values) File "/workspace/api/vllm_routes/chat.py", line 77, in create_chat_completion token_ids = engine.template.convert_messages_to_ids( File "/workspace/api/templates/base.py", line 63, in convert_messages_to_ids input_len = len(token_ids) TypeError: object of type 'int' has no len()

xusenlinzy commented 6 days ago

刚注意到glm4的tokenizer文件改动了，更新一下项目代码应该就可以了

xusenlinzy / api-for-open-llm

qwen2推理报错 #293

提交前必须检查以下项目 | The following items must be checked before submission

问题类型 | Type of problem

操作系统 | Operating system

详细描述问题 | Detailed description of the problem

Dependencies

运行日志或截图 | Runtime logs or screenshots