Open uniartisan opened 3 months ago
similar to https://github.com/intel/torch-xpu-ops/issues/628 and to pull request https://github.com/intel/torch-xpu-ops/pull/511 It must set:
export OverrideDefaultFP64Settings=1
export IGC_EnableDPEmulation=1
Hi, @uniartisan. Likely you are working on a system with both iGPU and dGPU (ARC). The operator should be compatible on ARC. So I assume xpu
in test_log_softmax('xpu')
indicate to the iGPU. Please try torch.xpu.get_device_properties('xpu:0')
and torch.xpu.get_device_properties('xpu:1')
to check which one is dGPU on your system exposed by PyTorch.
BTW, by default xpu
means xpu:0
.
@daisyden Could you help verify the case on the ARC? Assume our SYCL kernel implementation is FP64 irrelevant, and should work on ARC.
@daisyden Could you help verify the case on the ARC? Assume our SYCL kernel implementation is FP64 irrelevant, and should work on ARC.
This case is passed on master PyTorch(f3c3f3a3c39a359af6f06619e44e0d6a26b58e6d) and torch-xpu-ops(94d0ee6858633f00629ad1980d84df53b761fc8a)
🐛 Describe the bug
My environment is WSL, and pytorch 2.5 from source. and my card is ARC A770 log_softmax operation fails on XPU with "Kernel is incompatible with all devices" error
Description: While attempting to use the log_softmax operation on an XPU device, an error occurs indicating that the kernel is incompatible with all devices, despite recent commits purportedly adding support for this operation on XPU.
Error message: RuntimeError: Kernel is incompatible with all devices in devs
Steps to reproduce:
Expected behavior: The log_softmax operation should execute successfully on the XPU device.
Actual behavior: The operation fails with the RuntimeError stating the kernel is incompatible with all devices.
Versions
PyTorch version: 2.5.0a0+git4073f73 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.3 LTS (x86_64) GCC version: (conda-forge gcc 14.1.0-0) 14.1.0 Clang version: 14.0.0-1ubuntu1.1 CMake version: version 3.30.0 Libc version: glibc-2.35
Python version: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-5.15.153.1-microsoft-standard-WSL2-x86_64-with-glibc2.35 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: GenuineIntel Model name: 13th Gen Intel(R) Core(TM) i7-13700KF CPU family: 6 Model: 183 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 1 Stepping: 1 BogoMIPS: 6835.20 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities Virtualization: VT-x Hypervisor vendor: Microsoft Virtualization type: full L1d cache: 384 KiB (8 instances) L1i cache: 256 KiB (8 instances) L2 cache: 16 MiB (8 instances) L3 cache: 30 MiB (1 instance) Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Retbleed: Mitigation; Enhanced IBRS Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected
Versions of relevant libraries: [pip3] flake8==7.1.0 [pip3] numpy==1.26.4 [pip3] optree==0.12.1 [pip3] torch==2.5.0a0+gitac2e603 [pip3] torchao==0.3.1 [pip3] triton==3.0.0 [conda] numpy 1.26.4 pypi_0 pypi [conda] optree 0.12.1 pypi_0 pypi [conda] torch 2.5.0a0+gitac2e603 pypi_0 pypi [conda] torchao 0.3.1 pypi_0 pypi [conda] triton 3.0.0 pypi_0 pypi