Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self XPU Self XPU % XPU total XPU time avg # of Calls
non override aten::nonzero 5.50% 63.456ms 54.60% 630.302ms 489.365us 5.160ms 4.35% 34.468ms 26.761us 1288
override aten::nonzero 5.40% 58.551ms 52.64% 570.870ms 443.222us 6.688ms 5.55% 34.737ms 26.970us 1288
The low performance is caused by SYCL API, which we used to query kernel specific max work group size. We file issue to compiler to track this issue. https://github.com/intel/llvm/issues/15824
🐛 Describe the bug
Versions
Latest torch-xpu-ops vs IPEX 2.3 implementation.