yuhuixu1993 / qa-lora

Official PyTorch implementation of QA-LoRA
MIT License
110 stars 12 forks source link

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3! (when checking argument for argument mat2 in method wrapper_CUDA_mm) #21

Open orangezfj opened 8 months ago

orangezfj commented 8 months ago

Hello! The calculation accuracy of QLora training is float16, what is the calculation accuracy of qa-lora training? My fine-tuning TechGPT-7b was successful with QLora, but using qa-lora always reported the following error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3! (when checking argument for argument mat2 in method wrapper_CUDA_mm)

Looking forward to your answers, this is very important to me。

xxw11 commented 5 months ago

Hi, it seems to be an error that occurs when running on multiple GPUs. You can resolve this issue by simply specifying the visible devices before your run script, using CUDA_VISIBLE_DEVICES, for example: CUDA_VISIBLE_DEVICES=0 python qalora.py --model_path