Mismatched elements: 1 / 3 (33.3%)
Greatest absolute difference: nan at index (2,) (up to 1e-05 allowed)
Greatest relative difference: nan at index (2,) (up to 1.3e-06 allowed)
For complex64, I found that the nan issue in complex64 is caused by accumulated order: our xpu scan kernel would firstly reduce input[1], input[2], then reduce input[0], input[2] in this case. However, even cpu kernel will output [nan, nanj] when directly calculating logcumsumexp(input[1], input[2]).
### Versions
Related PR: https://github.com/intel/torch-xpu-ops/pull/931
🐛 Describe the bug
Mismatched elements: 2 / 125 (1.6%) Greatest absolute difference: 0.03125 at index (1, 4, 2) (up to 0.001 allowed) Greatest relative difference: 0.006072998046875 at index (2, 3, 1) (up to 0.001 allowed)
cpu output at (1, 4, 2): tensor(6.1875, dtype=torch.bfloat16) xpu output at (1, 4, 2): tensor(6.1562, device='xpu:0', dtype=torch.bfloat16)
PYTORCH_TEST_WITH_SLOW=1 python test/xpu/extended/test_ops_xpu.py TestCommonXPU.test_compare_cpu_logcumsumexp_xpu_complex128
Mismatched elements: 2 / 125 (1.6%) Greatest absolute difference: 12.566370614359174 at index (3, 3, 0) (up to 0.001 allowed) Greatest relative difference: 1.5103243157406059 at index (3, 4, 0) (up to 0.001 allowed)
cpu output at (3, 3, 0): tensor(7.4356+3.7336j, dtype=torch.complex128) xpu output at (3, 3, 0): tensor(7.4356-8.8328j, device='xpu:0', dtype=torch.complex128)
test_reductions_xpu.py::TestReductionsXPU::test_logcumsumexp_complex_xpu_complex64
Mismatched elements: 1 / 3 (33.3%) Greatest absolute difference: nan at index (2,) (up to 1e-05 allowed) Greatest relative difference: nan at index (2,) (up to 1.3e-06 allowed)
input : [1e3 + 0j, 1e-18 + 1e4j, 1e2 + 1e-8j] cpu_output : [1000.+0.j, 1000.+0.j, 1000.+0.j] cuda_output : [1000.+0.j, 1000.+0.j, 1000.+0.j] xpu_output : [1000.+0.j, 1000.+0.j, nan + nanj]