Error: cublas error during MatMul in Attention operator.

ztxz16 / fastllm

纯c++的全平台llm加速库，支持python调用，chatglm-6B级模型单卡可达10000+token / s，支持glm, llama, moss基座，手机端流畅运行

Apache License 2.0

3.23k stars 325 forks source link

Error: cublas error during MatMul in Attention operator. #433

Closed pingyuan2016 closed 4 months ago

pingyuan2016 commented 4 months ago

操作步骤： 1、使用chatglm3进行，微调 2、我这有两个显卡，使用cuda:0进行合并，调用的时候，使用cuda:1调用合并之后的模型，报上图错误

Error: CUDA error when allocating 8236 kB memory! maybe there's no enough memory left on device. CUDA error = 2, cudaErrorMemoryAllocation at /home/duanjinqiang/project/llm/fastllm/src/devices/cuda/fastllm-cuda.cu:1485 'out of memory' status = 7 Error: cublas error during MatMul in Attention operator. terminate called after throwing an instance of 'char const*' Aborted

TylunasLi commented 4 months ago

这个提示很明显是显存不足了。您需要关注在报错的那个时刻，显存的分配情况是怎么样的。如果设置过了set_device_map()，但显存仍然没有分配到您希望的GPU上，可以：

设置如下环境变量
```
export CUDA_DEVICE_ORDER=PCI_BUS_ID
```
检查环境变量CUDA_VISIBLE_DEVICES是否正确，如果锁定了0号卡，则执行set_device_map(“cuda:1”)还是0号卡。

pingyuan2016 commented 4 months ago

3q，看到了，有个显卡显存被占用了，但是没有显示占用的程序，以为是空卡，关了

pingyuan2016 commented 4 months ago