xusenlinzy / api-for-open-llm

Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc. 开源大模型的统一后端接口
Apache License 2.0
2.16k stars 252 forks source link

minicpm启动没问题,推理访问报错 #292

Open 760485464 opened 5 days ago

760485464 commented 5 days ago

提交前必须检查以下项目 | The following items must be checked before submission

问题类型 | Type of problem

模型推理和部署 | Model inference and deployment

操作系统 | Operating system

Linux

详细描述问题 | Detailed description of the problem

启动无报错,api访问报错访问内容 { "model": "minicpm-v", "stream":false, "messages": [ { "role": "user", "content": [ { "type": "text", "text": "这张图片是什么地方?" }, { "type": "image_url", "image_url": { "url": "http://djclub.cdn.bcebos.com/uploads/images/pageimg/20230325/64-2303252115313.jpg" } } ] } ] }

Dependencies

No response

运行日志或截图 | Runtime logs or screenshots

(minicpm) root@autodl-container-acc74095be-7fd6b47a:~/autodl-tmp/api-for-open-llm# python server.py 2024-06-28 19:01:43.514 | DEBUG | api.config::281 - SETTINGS: { "model_name": "minicpm-v", "model_path": "/root/autodl-tmp/models/MiniCPM-Llama3-V-2_5", "dtype": "bfloat16", "load_in_8bit": false, "load_in_4bit": false, "context_length": 2048, "chat_template": "minicpm-v", "rope_scaling": null, "flash_attn": false, "interrupt_requests": true, "host": "0.0.0.0", "port": 8000, "api_prefix": "/v1", "engine": "default", "tasks": [ "llm" ], "device_map": "auto", "gpus": null, "num_gpus": 1, "activate_inference": true, "model_names": [ "minicpm-v" ], "api_keys": null } Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:10<00:00, 1.43s/it] 2024-06-28 19:02:01.770 | INFO | api.models:create_hf_llm:81 - Using HuggingFace Engine 2024-06-28 19:02:01.770 | INFO | api.engine.hf:init:82 - Using minicpm-v Model for Chat! 2024-06-28 19:02:01.770 | INFO | api.engine.hf:init:83 - Using <api.templates.base.ChatTemplate object at 0x7f08429a4460> for Chat! INFO: Started server process [1092] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) 2024-06-28 19:02:24.431 | DEBUG | api.routes.chat:create_chat_completion:56 - ==== request ==== {'model': 'glm-4v', 'frequency_penalty': 0.0, 'function_call': None, 'functions': None, 'logit_bias': None, 'logprobs': False, 'max_tokens': 1024, 'n': 1, 'presence_penalty': 0.0, 'response_format': None, 'seed': None, 'stop': [], 'temperature': 0.9, 'tool_choice': None, 'tools': None, 'top_logprobs': None, 'top_p': 1.0, 'user': None, 'stream': False, 'repetition_penalty': 1.03, 'typical_p': None, 'watermark': False, 'best_of': 1, 'ignore_eos': False, 'use_beam_search': False, 'stop_token_ids': [], 'skip_special_tokens': True, 'spaces_between_special_tokens': True, 'min_p': 0.0, 'include_stop_str_in_output': False, 'length_penalty': 1.0, 'guided_json': None, 'guided_regex': None, 'guided_choice': None, 'guided_grammar': None, 'guided_decoding_backend': None, 'prompt_or_messages': [{'role': 'user', 'content': '你好'}], 'echo': False} Exception in thread Thread-2: Traceback (most recent call last): File "/root/miniconda3/envs/minicpm/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/root/miniconda3/envs/minicpm/lib/python3.8/threading.py", line 870, in run self._target(*self._args, *self._kwargs) File "/root/miniconda3/envs/minicpm/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "/root/miniconda3/envs/minicpm/lib/python3.8/site-packages/transformers/generation/utils.py", line 1914, in generate result = self._sample( File "/root/miniconda3/envs/minicpm/lib/python3.8/site-packages/transformers/generation/utils.py", line 2693, in _sample next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) RuntimeError: probability tensor contains either inf, nan or element < 0