unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
16.92k stars 1.16k forks source link

RuntimeError: expected self and mask to be on the same device, but got mask on cuda:7 and self on cuda:0 #996

Open Silentssss opened 1 month ago

Silentssss commented 1 month ago

When I used fast_cross_entropy_loss instead of torch.nn.CrossEntropyLoss, this error happend. File "/mnt/fs/user/xingjinliang/unsloth/unsloth/kernels/cross_entropy_loss.py", line 318, in fast_cross_entropy_loss loss = Fast_CrossEntropyLoss.apply( File "/usr/local/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/mnt/fs/user/xingjinliang/unsloth/unsloth/kernels/cross_entropy_loss.py", line 272, in forward losses.masked_fill_(labels == -100, 0) # Don't forget to mask padding out! RuntimeError: expected self and mask to be on the same device, but got mask on cuda:7 and self on cuda:0

Why device = "cuda: 0" in losses = torch.empty(n_rows, dtype = torch.float32, device = "cuda: 0") and logsumexp = torch.empty(n_rows, dtype = torch.float32, device = "cuda: 0") and logsumexp = torch.empty((n_rows, n_chunks,), dtype = torch.float32, device = "cuda: 0") of kernels/cross_entropy_loss.py. I think device = logits.device is correct, after I change it there is no error. Check it please!

danielhanchen commented 1 month ago

Oh wait is this on multi GPU setups?

Silentssss commented 1 month ago

Yes,I am running on 8 GPUS