[Client] DNNL does not support bf16/f16 backward on the platform with avx2_vnni_2

🐛 Describe the bug

We find a failure when running on Client GPU:

_____________________________ TestNN.test_no_grad ______________________________
Traceback (most recent call last):
  File "/home/zhouyi/work/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/test_nn.py", line 274, in test_no_grad
    output.backward(torch.ones(1, 5, 10, 10))
  File "/home/zhouyi/miniforge3/envs/xpu_op_0/lib/python3.10/site-packages/torch/_tensor.py", line 581, in backward
    torch.autograd.backward(
  File "/home/zhouyi/miniforge3/envs/xpu_op_0/lib/python3.10/site-packages/torch/autograd/__init__.py", line 347, in backward
    _engine_run_backward(
  File "/home/zhouyi/miniforge3/envs/xpu_op_0/lib/python3.10/site-packages/torch/autograd/graph.py", line 825, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: DNNL does not support bf16/f16 backward on the platform with avx2_vnni_2

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_SLOW=1 python test/test_nn.py TestNN.test_no_grad

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

Versions

Ubuntu 24.10 PyTorch private branch pytorch-2.5.1 (f05bc32c) torch-xpu-ops pined 02060c616 Compiler intel-compiler-2025.0.1.1225_offline.sh

intel / torch-xpu-ops

[Client] DNNL does not support bf16/f16 backward on the platform with avx2_vnni_2 #1094

🐛 Describe the bug

Versions