Open Qin-xb opened 5 months ago
执行 CUDA_VISIBLE_DEVICES=1 python -m qanything_kernel.qanything_server.sanic_api --host 0.0.0.0 --port 14009 --model_size 3B
Initializing an LLM engine with config: model='/workspace/QAnything/assets/custom_models/netease-youdao/MiniChat-2-3B', tokenizer='/workspace/QAnything/assets/custom_models/netease-youdao/MiniChat-2-3B', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=auto, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, seed=0)
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy
(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False
. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the legacy
(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False
. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
INFO 05-23 08:37:19 llm_engine.py:357] # GPU blocks: 7221, # CPU blocks: 910
INFO 05-23 08:37:21 model_runner.py:684] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 05-23 08:37:21 model_runner.py:688] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing gpu_memory_utilization
or enforcing eager mode. You can also reduce the max_num_seqs
as needed to decrease memory usage.
INFO 05-23 08:37:29 model_runner.py:756] Graph capturing finished in 8 secs.
已启动后端服务,请复制[ http://0.0.0.0:8777/qanything/ ]到浏览器进行测试。
[2024-05-23 08:37:36 +0000] [48195] [INFO] Starting worker [48195]
看你的运行路径是不是符合加载静态页面的路径 /dist/qanything/
app.static('/qanything/', './dist/qanything/', name='qanything', index="index.html")
这解决了吗
请求时,注意末尾的斜杠不可省略,如果是 http://{ip}:8777/qanything 会404,应改为http://{ip}:8777/qanything/ doc: https://github.com/netease-youdao/QAnything/blob/master/QAnything%E4%BD%BF%E7%94%A8%E8%AF%B4%E6%98%8E.md#Python%E7%89%88%E6%9C%AC%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97:~:text=%E4%BD%BF%E7%94%A8UI%E7%95%8C%E9%9D%A2-,%E6%B3%A8%E6%84%8F%E6%9C%AB%E5%B0%BE%E7%9A%84%E6%96%9C%E6%9D%A0%E4%B8%8D%E5%8F%AF%E7%9C%81%E7%95%A5%EF%BC%8C%E5%90%A6%E5%88%99%E4%BC%9A%E5%87%BA%E7%8E%B0404%E9%94%99%E8%AF%AF,-API%20%E6%96%87%E6%A1%A3
这个BUG是因为找不到index.html文件导致的。原因是python运行的时候,默认参数用的是相对于该项目的路径。 解决方法:
app.static('/qanything/', '/home/{user}/project/QAnything/qanything_kernel/qanything_server/dist/qanything/', name='qanything', index="index.html")
sanic.py
地址的绝对路径。app.static('/qanything/', './dist/qanything/', name='qanything', index="index.html")
把访问的由端口http://localhost:8777
改为URL路径http://localhost:8777/qanything/
就可以解决了
这个BUG是因为找不到index.html文件导致的。原因是python运行的时候,默认参数用的是相对于该项目的路径。 解决方法:
- 改为绝对路径。
app.static('/qanything/', '/home/{user}/project/QAnything/qanything_kernel/qanything_server/dist/qanything/', name='qanything', index="index.html")
- 改为相对于
sanic.py
地址的绝对路径。app.static('/qanything/', './dist/qanything/', name='qanything', index="index.html")
可以解决
Please Describe The Problem To Be Solved (Replace This Text: Please present a concise description of the problem to be addressed by this feature request. Please be clear what parts of the problem are considered to be in-scope and out-of-scope.)
(Optional): Suggest A Solution (Replace This Text: A concise description of your preferred solution. Things to address include:
If there are multiple solutions, please present each one separately. Save comparisons for the very end.)
启动3B模型之后,在浏览器按照ip,port打开之后404错误