Closed pingyuan2016 closed 4 months ago
这个提示很明显是显存不足了。您需要关注在报错的那个时刻,显存的分配情况是怎么样的。
如果设置过了set_device_map()
,但显存仍然没有分配到您希望的GPU上,可以:
export CUDA_DEVICE_ORDER=PCI_BUS_ID
CUDA_VISIBLE_DEVICES
是否正确,如果锁定了0号卡,则执行set_device_map(“cuda:1”)
还是0号卡。3q,看到了,有个显卡显存被占用了,但是没有显示占用的程序,以为是空卡,关了
1
Error: CUDA error when allocating 8236 kB memory! maybe there's no enough memory left on device. CUDA error = 2, cudaErrorMemoryAllocation at /home/duanjinqiang/project/llm/fastllm/src/devices/cuda/fastllm-cuda.cu:1485 'out of memory' status = 7 Error: cublas error during MatMul in Attention operator. terminate called after throwing an instance of 'char const*' Aborted