intel / torch-xpu-ops

Apache License 2.0
23 stars 15 forks source link

PI_ERROR_INVALID_QUEUE after copying device 0 tensor to device 1 #745

Open daisyden opened 1 month ago

daisyden commented 1 month ago

🐛 Describe the bug

import torch
a = torch.empty(3, device=torch.device('xpu:0'))
a.fill_(1.1)
b = a.to(device='xpu:1')
a.device
b.device
print(b.cpu())
**print(b)**

Report:

tensor([1.1000, 1.1000, 1.1000])
Traceback (most recent call last):
  File "/home/gta/daisyden/pytorch4/test/aa.py", line 8, in <module>
    print(b)
  File "/home/gta/miniforge3/envs/daisy_pytorch4/lib/python3.10/site-packages/torch/_tensor.py", line 464, in __repr__
    return torch._tensor_str._str(self, tensor_contents=tensor_contents)
  File "/home/gta/miniforge3/envs/daisy_pytorch4/lib/python3.10/site-packages/torch/_tensor_str.py", line 714, in _str
    return _str_intern(self, tensor_contents=tensor_contents)
  File "/home/gta/miniforge3/envs/daisy_pytorch4/lib/python3.10/site-packages/torch/_tensor_str.py", line 631, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "/home/gta/miniforge3/envs/daisy_pytorch4/lib/python3.10/site-packages/torch/_tensor_str.py", line 363, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/home/gta/miniforge3/envs/daisy_pytorch4/lib/python3.10/site-packages/torch/_tensor_str.py", line 152, in __init__
    nonzero_finite_vals = torch.masked_select(
RuntimeError: Native API failed. Native API returns: -36 (PI_ERROR_INVALID_QUEUE) -36 (PI_ERROR_INVALID_QUEUE)

Versions

latest version

fengyuan14 commented 1 month ago

SYCL runtime issue.

As latest SYCL spec, we are recommended to use info::kernel_device_specific::work_group_size instead of info::device::max_work_group_size. But there is a new issue found. Cannot launch kernel successfully on PVC Tile 1 after querying info::kernel_device_specific::work_group_size. Got runtime error.

daisyden commented 1 month ago

duplicated with https://github.com/intel/torch-xpu-ops/issues/339

fengyuan14 commented 4 weeks ago

The issue is common for all platform where there are devices more than one. The most important and most common case for us is client case, a client platform/desktop has an iGPU and an dGPU.

fengyuan14 commented 3 weeks ago

https://github.com/intel/llvm/issues/15127

ddkalamk commented 1 day ago

@fengyuan14 can we please apply the workaround available to fix this problem?

i.e. change https://github.com/intel/torch-xpu-ops/blob/main/src/comm/DeviceProperties.h#L19C3-L20C79 auto kbundle = ::sycl::get_kernel_bundle<::sycl::bundle_state::executable>(ctx, {kid});

to

auto kbundle = ::sycl::get_kernel_bundle<::sycl::bundle_state::executable>(ctx, {dev}, {kid});

ddkalamk commented 1 day ago

@daisyden @fengyuan14 Test results after applying fix:

(pt_src) [ddkalamk@pcl-pvc01 pytorch]$ cat test2.py
import torch
print("PyTorch version: ", torch.__version__)
a = torch.empty(3, device=torch.device('xpu:0'))
a.fill_(1.1)
b = a.to(device='xpu:1')
a.device
b.device
print(b.cpu())
print(b)

(pt_src) [ddkalamk@pcl-pvc01 pytorch]$ python -u test2.py
PyTorch version:  2.5.0a0+git8693322
tensor([1.1000, 1.1000, 1.1000])
tensor([1.1000, 1.1000, 1.1000], device='xpu:1')
fengyuan14 commented 1 day ago

Hi, @ddkalamk. We have got a PR for it on main branch. Recently, we are busy on PT2.5 release. Will land the PR ASAP. https://github.com/intel/torch-xpu-ops/pull/769

ddkalamk commented 1 day ago

Sounds good, thanks.