KLDivLoss function in pytorch always fallsback to CPU

jgtong commented 1 month ago

🐛 Describe the bug

Greetings:

I am trying to use the torch.nn.KLDivLoss function on Intel's Max 1550 GPU; however, the function keeps falling back to execute on the CPU.

Below is a reproducer:

import torch

kl_div = torch.nn.KLDivLoss(reduction="batchmean")
input = torch.randn(1024, 1024, requires_grad=True, device="xpu").log_softmax(dim=-1)
target = torch.randn(1024, 1024, requires_grad=True, device="xpu").softmax(dim=-1)
print(f'{kl_div(input,target)=}')

I also tried with different tensor sizes with no success. Is this function not supported on GPU?

Versions

I am hesitant to put the full output of collect_env.py because this system is internal.

OS: Ubuntu 22.04.5 Pytorch-gpu-dev: 0.5.3 Driver: 950.13 GPU: Intel Max 1550

jgtong commented 2 weeks ago

Hello, is there anyone can help me with this issue?

xytintel commented 1 week ago

@jgtong It was likely caused by a missing op previously. Tried on xpu-ops commit (https://github.com/intel/torch-xpu-ops/commit/804a03b76e6b1270327f3f6ddbe58b6ffba5d30e), and the test passed successfully. The output is: kl_div(input,target)=tensor(0.9970, device='xpu:0', grad_fn=<DivBackward0>)

jgtong commented 6 days ago

@xytintel

Thanks for the response. Is this change been committed to pytorch-upstream or do I need to download a special pytorch-wheels package to see this change?

intel / torch-xpu-ops

KLDivLoss function in pytorch always fallsback to CPU #1011

🐛 Describe the bug

Versions