xusenlinzy / api-for-open-llm

Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, Xverse, SqlCoder, CodeLLaMA, ChatGLM, ChatGLM2, ChatGLM3 etc. 开源大模型的统一后端接口
Apache License 2.0
2.16k stars 252 forks source link

找不到图标 #265

Closed lucheng07082221 closed 2 months ago

lucheng07082221 commented 2 months ago

提交前必须检查以下项目 | The following items must be checked before submission

问题类型 | Type of problem

其他问题 | Other issues

操作系统 | Operating system

None

详细描述问题 | Detailed description of the problem

感谢非常棒的工作: 但是我找不到图标 lc@lc-ConceptD-CT500-51A:~/work/api-for-open-llm$ python3 server.py 2024-04-19 16:18:05.953 | DEBUG | api.config::338 - SETTINGS: { "embedding_name": "/home/lc/work/QAnything/netease-youdao/bce-embedding-base_v1", "rerank_name": "/home/lc/work/QAnything/netease-youdao/bce-reranker-base_v1", "embedding_size": -1, "embedding_device": "cuda:0", "rerank_device": "cuda:0", "model_name": "qwen", "model_path": "/media/lc/lc/Qwen-1_8B-Chat", "dtype": "half", "load_in_8bit": false, "load_in_4bit": false, "context_length": -1, "chat_template": null, "rope_scaling": null, "flash_attn": false, "use_streamer_v2": false, "interrupt_requests": true, "host": "0.0.0.0", "port": 8090, "api_prefix": "/v1", "engine": "default", "tasks": [ "llm", "rag" ], "device_map": "auto", "gpus": "0", "num_gpus": 1, "activate_inference": true, "model_names": [ "qwen", "bce-embedding-base_v1", "bce-reranker-base_v1" ], "api_keys": null } /usr/local/lib/python3.10/dist-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() 2024-04-19 16:18:10.411 | INFO | api.rag.models.rerank:init:51 - Loading from /home/lc/work/QAnything/netease-youdao/bce-reranker-base_v1. 2024-04-19 16:18:10.546 | INFO | api.rag.models.rerank:init:77 - Execute device: cuda:0; gpu num: 1; use fp16: False 2024-04-19 16:18:10.818 | INFO | api.adapter.patcher:patch_tokenizer:119 - Add eos token: <|endoftext|> 2024-04-19 16:18:10.819 | INFO | api.adapter.patcher:patch_tokenizer:126 - Add pad token: <|endoftext|> Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2.20it/s] 2024-04-19 16:18:12.005 | INFO | api.models:create_hf_llm:81 - Using default engine 2024-04-19 16:18:12.006 | INFO | api.core.default:_check_construct_prompt:126 - Using Qwen Model for Chat! INFO: Started server process [14744] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) INFO: 127.0.0.1:45580 - "GET /v1 HTTP/1.1" 404 Not Found INFO: 127.0.0.1:45580 - "GET /favicon.ico HTTP/1.1" 404 Not Found INFO: 127.0.0.1:45580 - "GET /v1 HTTP/1.1" 404 Not Found INFO: 127.0.0.1:45580 - "GET /favicon.ico HTTP/1.1" 404 Not Found INFO: 127.0.0.1:45580 - "GET /v1 HTTP/1.1" 404 Not Found INFO: 127.0.0.1:45580 - "GET /favicon.ico HTTP/1.1" 404 Not Found INFO: 127.0.0.1:45580 - "GET / HTTP/1.1" 404 Not Found INFO: 127.0.0.1:45580 - "GET /favicon.ico HTTP/1.1" 404 Not Found

Dependencies

lc@lc-ConceptD-CT500-51A:~/work/api-for-open-llm$ python3 server.py 2024-04-19 16:18:05.953 | DEBUG | api.config::338 - SETTINGS: { "embedding_name": "/home/lc/work/QAnything/netease-youdao/bce-embedding-base_v1", "rerank_name": "/home/lc/work/QAnything/netease-youdao/bce-reranker-base_v1", "embedding_size": -1, "embedding_device": "cuda:0", "rerank_device": "cuda:0", "model_name": "qwen", "model_path": "/media/lc/lc/Qwen-1_8B-Chat", "dtype": "half", "load_in_8bit": false, "load_in_4bit": false, "context_length": -1, "chat_template": null, "rope_scaling": null, "flash_attn": false, "use_streamer_v2": false, "interrupt_requests": true, "host": "0.0.0.0", "port": 8090, "api_prefix": "/v1", "engine": "default", "tasks": [ "llm", "rag" ], "device_map": "auto", "gpus": "0", "num_gpus": 1, "activate_inference": true, "model_names": [ "qwen", "bce-embedding-base_v1", "bce-reranker-base_v1" ], "api_keys": null } /usr/local/lib/python3.10/dist-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() 2024-04-19 16:18:10.411 | INFO | api.rag.models.rerank:init:51 - Loading from /home/lc/work/QAnything/netease-youdao/bce-reranker-base_v1. 2024-04-19 16:18:10.546 | INFO | api.rag.models.rerank:init:77 - Execute device: cuda:0; gpu num: 1; use fp16: False 2024-04-19 16:18:10.818 | INFO | api.adapter.patcher:patch_tokenizer:119 - Add eos token: <|endoftext|> 2024-04-19 16:18:10.819 | INFO | api.adapter.patcher:patch_tokenizer:126 - Add pad token: <|endoftext|> Try importing flash-attention for faster inference... Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2.20it/s] 2024-04-19 16:18:12.005 | INFO | api.models:create_hf_llm:81 - Using default engine 2024-04-19 16:18:12.006 | INFO | api.core.default:_check_construct_prompt:126 - Using Qwen Model for Chat! INFO: Started server process [14744] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit) INFO: 127.0.0.1:45580 - "GET /v1 HTTP/1.1" 404 Not Found INFO: 127.0.0.1:45580 - "GET /favicon.ico HTTP/1.1" 404 Not Found INFO: 127.0.0.1:45580 - "GET /v1 HTTP/1.1" 404 Not Found INFO: 127.0.0.1:45580 - "GET /favicon.ico HTTP/1.1" 404 Not Found INFO: 127.0.0.1:45580 - "GET /v1 HTTP/1.1" 404 Not Found INFO: 127.0.0.1:45580 - "GET /favicon.ico HTTP/1.1" 404 Not Found INFO: 127.0.0.1:45580 - "GET / HTTP/1.1" 404 Not Found INFO: 127.0.0.1:45580 - "GET /favicon.ico HTTP/1.1" 404 Not Found

运行日志或截图 | Runtime logs or screenshots

# 请在此处粘贴运行日志
# Please paste the run log here
![Screenshot from 2024-04-19 16-21-15](https://github.com/xusenlinzy/api-for-open-llm/assets/3146209/5bed9a45-b7e9-42f1-a383-37237600c894)
xusenlinzy commented 2 months ago

没有这个 http://0.0.0.0:8090/v1 接口 你可以到 http://0.0.0.0:8090/docs 查看可用接口有哪些

skyliwq commented 1 month ago

我也是这个问题 如何解决