netease-youdao / QAnything

Question and Answer based on Anything.
https://qanything.ai
GNU Affero General Public License v3.0
11.52k stars 1.12k forks source link

多gpu的情况下为什么只检测第二张显卡的显存,而不是总共的显存呢? #119

Open Meaninles opened 7 months ago

Meaninles commented 7 months ago

多gpu的情况下为什么只检测第二张显卡的显存,而不是总共的显存呢?我用的tesla t4,检测的显存是15360MiB,提示我不能跑7B模型,但是我有两张呀

Meaninles commented 7 months ago

`(base) -bash-4.2# bash ./run.sh -c local -i 0,1 -b default From https://github.com/netease-youdao/QAnything

默认后端为FasterTransformer,仅支持Nvidia RTX 30系列或40系列显卡,您的显卡型号为: Tesla T4, 不在支持列表中,将自动为您切换后端: 根据匹配算法,已自动为您切换为huggingface后端 您当前的显存为 15360 MiB,推荐部署3B及3B以下的模型,包括在线的OpenAI API 您的显存不足以部署 7B 模型,请重新选择模型大小`

Meaninles commented 7 months ago

qanything的多gpu,是不是只是把llm和其他embedding、rerank模型分开跑呀?并不是vllm那种多gpu的意思吗?

wuwq commented 5 months ago

请问现在解决了吗?我也遇到同样的问题

TtZJ2 commented 5 months ago

+1,我也是这个问题:2张T4卡运行 sudo bash ./run.sh -c local -i 0,1 -b hf -m Qwen-7B-QAnything -t qwen-7b-qanything

日志: qanything-container-local | qanything-container-local | ============================= qanything-container-local | == Triton Inference Server == qanything-container-local | ============================= qanything-container-local | qanything-container-local | NVIDIA Release 23.05 (build 61161506) qanything-container-local | Triton Server Version 2.34.0 qanything-container-local | qanything-container-local | Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. qanything-container-local | qanything-container-local | Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved. qanything-container-local | qanything-container-local | This container image and its contents are governed by the NVIDIA Deep Learning Container License. qanything-container-local | By pulling and using the container, you accept the terms and conditions of this license: qanything-container-local | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license qanything-container-local | qanything-container-local | llm_api is set to [local] qanything-container-local | device_id is set to [0,1,2,3] qanything-container-local | runtime_backend is set to [hf] qanything-container-local | model_name is set to [Qwen-7B-QAnything] qanything-container-local | conv_template is set to [qwen-7b-qanything] qanything-container-local | tensor_parallel is set to [1] qanything-container-local | gpu_memory_utilization is set to [0.9] qanything-container-local | checksum 3ea0e2e1a4c07d65fc2e64c98a86809e qanything-container-local | default_checksum 3ea0e2e1a4c07d65fc2e64c98a86809e qanything-container-local | GPU ID: 0, 1 qanything-container-local | GPU1 Model: Tesla T4 qanything-container-local | Compute Capability: 7.5 qanything-container-local | OCR_USE_GPU=True because 7.5 >= 7.5 qanything-container-local | ==================================================== qanything-container-local | **** 重要提示 **** qanything-container-local | ==================================================== qanything-container-local | qanything-container-local | 您当前的显存为 15360 MiB,推荐部署3B及3B以下的模型,包括在线的OpenAI API qanything-container-local | 您的显存不足以部署 7B 模型,请重新选择模型大小

TtZJ2 commented 5 months ago

请问Qanything什么时候可以支持3张、4张显卡并行,这才是多GPU

TtZJ2 commented 5 months ago

多gpu的情况下为什么只检测第二张显卡的显存,而不是总共的显存呢?我用的tesla t4,检测的显存是15360MiB,提示我不能跑7B模型,但是我有两张呀 我也是一样的情况,T4显卡,两张,说显存不够,可是它没有跑第二张显卡

SanBingYouYong commented 2 months ago

qanything的多gpu,是不是只是把llm和其他embedding、rerank模型分开跑呀?并不是vllm那种多gpu的意思吗?

这个怎么设置呢, 我这里跑一个3B的模型就占了14G左右, 不知道是不是有很多模型同时部署了

vc5409ftu commented 1 month ago

要跑7B只能换24G的显卡,多显卡也没用

GreatStep commented 1 month ago

不支持~