netease-youdao / QAnything

Question and Answer based on Anything.
https://qanything.ai
GNU Affero General Public License v3.0
11.44k stars 1.1k forks source link

[BUG] win11 wsl2 运行MiniChat-2-3B不成功 #235

Open fivegg opened 5 months ago

fivegg commented 5 months ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

显卡12G,采用下面命令:bash ./run.sh -c local -i 0 -b hf -m MiniChat-2-3B -t minichat。运行报错: console输出: qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :) qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current qanything-container-local | Dload Upload Total Spent Left Speed 100 13 100 13 0 0 14099 0 --:--:-- --:--:-- --:--:-- 13000 qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :) qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :) qanything-container-local | 启动 LLM 服务超时,自动检查 /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 中是否存在Error... qanything-container-local | /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 中未检测到明确的错误信息。请手动排查 /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 以获取更多信息。 fschat_model_worker_7801.log日志: 2024-04-09 17:04:57 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=7801, worker_address='http://0.0.0.0:7801', controller_address='http://0.0.0.0:7800', model_path='/model_repos/CustomLLM/MiniChat-2-3B', revision='main', device='cuda', gpus='0', num_gpus=1, max_gpu_memory=None, dtype='bfloat16', load_8bit=True, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, enable_exllama=False, exllama_max_seq_len=4096, exllama_gpu_split=None, exllama_cache_8bit=False, enable_xft=False, xft_max_seq_len=4096, xft_dtype=None, model_names=None, conv_template='minichat', embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None, debug=False, ssl=False) 2024-04-09 17:04:57 | INFO | model_worker | Loading the model ['MiniChat-2-3B'] on worker e44e3aad ... You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 2024-04-09 17:04:58 | ERROR | stderr | 0%| | 0/1 [00:00<?, ?it/s]

期望行为 | Expected Behavior

No response

运行环境 | Environment

- OS: Windows 11 WSL2
- NVIDIA Driver:537.70
- CUDA:
- docker:Docker Desktop 4.28.0 (139021)
- docker-compose:
- NVIDIA GPU:RTX3060
- NVIDIA GPU Memory:12G

QAnything日志 | QAnything logs

console输出: qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :) qanything-container-local | % Total % Received % Xferd Average Speed Time Time Time Current qanything-container-local | Dload Upload Total Spent Left Speed 100 13 100 13 0 0 14099 0 --:--:-- --:--:-- --:--:-- 13000 qanything-container-local | The llm service is starting up, it can be long... you have time to make a coffee :) qanything-container-local | LLM 服务正在启动,可能需要一段时间...你有时间去冲杯咖啡 :) qanything-container-local | 启动 LLM 服务超时,自动检查 /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 中是否存在Error... qanything-container-local | /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 中未检测到明确的错误信息。请手动排查 /workspace/qanything_local/logs/debug_logs/fastchat_logs/fschat_model_worker_7801.log 以获取更多信息。 fschat_model_worker_7801.log日志: 2024-04-09 17:04:57 | INFO | model_worker | args: Namespace(host='0.0.0.0', port=7801, worker_address='http://0.0.0.0:7801', controller_address='http://0.0.0.0:7800', model_path='/model_repos/CustomLLM/MiniChat-2-3B', revision='main', device='cuda', gpus='0', num_gpus=1, max_gpu_memory=None, dtype='bfloat16', load_8bit=True, cpu_offloading=False, gptq_ckpt=None, gptq_wbits=16, gptq_groupsize=-1, gptq_act_order=False, awq_ckpt=None, awq_wbits=16, awq_groupsize=-1, enable_exllama=False, exllama_max_seq_len=4096, exllama_gpu_split=None, exllama_cache_8bit=False, enable_xft=False, xft_max_seq_len=4096, xft_dtype=None, model_names=None, conv_template='minichat', embed_in_truncate=False, limit_worker_concurrency=5, stream_interval=2, no_register=False, seed=None, debug=False, ssl=False) 2024-04-09 17:04:57 | INFO | model_worker | Loading the model ['MiniChat-2-3B'] on worker e44e3aad ... You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 2024-04-09 17:04:58 | ERROR | stderr | 0%| | 0/1 [00:00<?, ?it/s]

复现方法 | Steps To Reproduce

No response

备注 | Anything else?

No response

wangweicug commented 5 months ago

同配置遇到相同的问题,日志报错内容一样,有什么解决方法吗?

xuyao18 commented 5 months ago

Linux环境中也是一样的报错:

0%| | 0/1 [00:00<?, ?it/s]1 10:58:03 | ERROR | stderr |


根据这个issue,可以通过增加内存或者swap解决这个问题。 https://github.com/oobabooga/text-generation-webui/issues/2509

fivegg commented 5 months ago

感觉是联网下载什么,但是网不通。

ye-jeck commented 4 months ago
Upcreat commented 4 months ago

+1不知道出现什么问题

weolix commented 2 months ago

image 内存不够被系统kill了,运行的时候可以检查一下内存