intel / torch-xpu-ops

Apache License 2.0
23 stars 15 forks source link

oneDNN issues on MTL and ARC #875

Open daisyden opened 1 week ago

daisyden commented 1 week ago

🐛 Describe the bug

We found there are ~180 cases got failed specifically on MTL, it can pass on PVC, on ARC the cases will fail in op creation with fp64 cases excluded. MTL: mtl.log

ARC log for convolution: arc.log RuntimeError: could not create a primitive descriptor for a convolution forward propagation primitive

test_ops_xpu.py::TestCommonXPU::test_numpy_ref_nn_functional_conv_transpose1d_xpu_complex128 test_ops_xpu.py::TestCommonXPU::test_numpy_ref_nn_functional_conv_transpose1d_xpu_float64 test_modules_xpu.py::TestModuleXPU::test_grad_nn_ConvTranspose1d_xpu_float64 test_modules_xpu.py::TestModuleXPU::test_grad_nn_ConvTranspose2d_xpu_float64 test_modules_xpu.py::TestModuleXPU::test_grad_nn_ConvTranspose3d_xpu_float64 test_modules_xpu.py::TestModuleXPU::test_grad_nn_LazyConvTranspose1d_xpu_float64 test_modules_xpu.py::TestModuleXPU::test_grad_nn_LazyConvTranspose2d_xpu_float64 test_modules_xpu.py::TestModuleXPU::test_grad_nn_LazyConvTranspose3d_xpu_float64 test_modules_xpu.py::TestModuleXPU::test_gradgrad_nn_Conv1d_xpu_float64 test_modules_xpu.py::TestModuleXPU::test_gradgrad_nn_Conv2d_xpu_float64 test_modules_xpu.py::TestModuleXPU::test_gradgrad_nn_Conv3d_xpu_float64 test_modules_xpu.py::TestModuleXPU::test_gradgrad_nn_LazyConv1d_xpu_float64 test_modules_xpu.py::TestModuleXPU::test_gradgrad_nn_LazyConv2d_xpu_float64 test_modules_xpu.py::TestModuleXPU::test_gradgrad_nn_LazyConv3d_xpu_float64 test_linalg_xpu.py::TestLinalgXPU::test_addmm_baddbmm_overflow_xpu_float16 test_linalg_xpu.py::TestLinalgXPU::test_hipblaslt_corner_cases_rocm_xpu_float16 test_ops_fwd_gradients_xpu.py::TestFwdGradientsXPU::test_fn_fwgrad_bwgrad_nn_functional_conv1d_xpu_complex128 test_ops_fwd_gradients_xpu.py::TestFwdGradientsXPU::test_fn_fwgrad_bwgrad_nn_functional_conv1d_xpu_float64 test_ops_fwd_gradients_xpu.py::TestFwdGradientsXPU::test_fn_fwgrad_bwgrad_nn_functional_conv2d_xpu_complex128 test_ops_fwd_gradients_xpu.py::TestFwdGradientsXPU::test_fn_fwgrad_bwgrad_nn_functional_conv2d_xpu_float64 test_ops_fwd_gradients_xpu.py::TestFwdGradientsXPU::test_forward_mode_AD_nn_functional_conv_transpose1d_xpu_complex128 test_ops_fwd_gradients_xpu.py::TestFwdGradientsXPU::test_forward_mode_AD_nn_functional_conv_transpose1d_xpu_float64 test_ops_gradients_xpu.py::TestBwdGradientsXPU::test_fn_grad_nn_functional_conv_transpose1d_xpu_complex128 test_ops_gradients_xpu.py::TestBwdGradientsXPU::test_fn_grad_nn_functional_conv_transpose1d_xpu_float64 test_ops_gradients_xpu.py::TestBwdGradientsXPU::test_fn_gradgrad_nn_functional_conv1d_xpu_complex128 test_ops_gradients_xpu.py::TestBwdGradientsXPU::test_fn_gradgrad_nn_functional_conv1d_xpu_float64 test_ops_gradients_xpu.py::TestBwdGradientsXPU::test_fn_gradgrad_nn_functional_conv2d_xpu_complex128 test_ops_gradients_xpu.py::TestBwdGradientsXPU::test_fn_gradgrad_nn_functional_conv2d_xpu_float64 nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_Conv2d_deterministic_cudnn_xpu_complex64 nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_Conv2d_deterministic_cudnn_xpu_float32 nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv3d_same_padding_backward_xpu_complex128 nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv3d_same_padding_backward_xpu_float64 nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv3d_valid_padding_backward_xpu_complex128 nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv3d_valid_padding_backward_xpu_float64 nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_dilated_has_bias_False_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_dilated_has_bias_False_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_dilated_has_bias_False_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_dilated_has_bias_False_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_dilated_has_bias_True_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_dilated_has_bias_True_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_dilated_has_bias_True_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_dilated_has_bias_True_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_dilated_transposed_has_bias_False_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_dilated_transposed_has_bias_False_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_dilated_transposed_has_bias_False_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_dilated_transposed_has_bias_False_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_dilated_transposed_has_bias_True_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_dilated_transposed_has_bias_True_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_dilated_transposed_has_bias_True_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_dilated_transposed_has_bias_True_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_has_bias_False_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_has_bias_False_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_has_bias_False_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_has_bias_False_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_has_bias_True_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_has_bias_True_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_has_bias_True_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_has_bias_True_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_transposed_has_bias_False_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_transposed_has_bias_False_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_transposed_has_bias_False_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_transposed_has_bias_False_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_transposed_has_bias_True_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_transposed_has_bias_True_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_transposed_has_bias_True_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow1d_transposed_has_bias_True_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_dilated_has_bias_False_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_dilated_has_bias_False_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_dilated_has_bias_False_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_dilated_has_bias_False_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_dilated_has_bias_True_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_dilated_has_bias_True_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_dilated_has_bias_True_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_dilated_has_bias_True_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_dilated_transposed_has_bias_False_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_dilated_transposed_has_bias_False_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_dilated_transposed_has_bias_False_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_dilated_transposed_has_bias_False_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_dilated_transposed_has_bias_True_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_dilated_transposed_has_bias_True_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_dilated_transposed_has_bias_True_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_dilated_transposed_has_bias_True_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_has_bias_False_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_has_bias_False_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_has_bias_False_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_has_bias_False_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_has_bias_True_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_has_bias_True_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_has_bias_True_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_has_bias_True_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_transposed_has_bias_False_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_transposed_has_bias_False_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_transposed_has_bias_False_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_transposed_has_bias_False_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_transposed_has_bias_True_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_transposed_has_bias_True_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_transposed_has_bias_True_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow2d_transposed_has_bias_True_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow3d_dilated_has_bias_False_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow3d_dilated_has_bias_False_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow3d_dilated_has_bias_False_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow3d_dilated_has_bias_False_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow3d_dilated_has_bias_True_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow3d_dilated_has_bias_True_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow3d_xpu_has_bias_False_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow3d_xpu_has_bias_False_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow3d_xpu_has_bias_False_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow3d_xpu_has_bias_False_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow3d_xpu_has_bias_True_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow3d_xpu_has_bias_True_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow3d_xpu_has_bias_True_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_slow3d_xpu_has_bias_True_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu1d_has_bias_False_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu1d_has_bias_False_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu1d_has_bias_False_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu1d_has_bias_False_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu1d_has_bias_True_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu1d_has_bias_True_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu1d_has_bias_True_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu1d_has_bias_True_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu1d_transposed_has_bias_False_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu1d_transposed_has_bias_False_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu1d_transposed_has_bias_False_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu1d_transposed_has_bias_False_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu1d_transposed_has_bias_True_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu1d_transposed_has_bias_True_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu1d_transposed_has_bias_True_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu1d_transposed_has_bias_True_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu2d_has_bias_False_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu2d_has_bias_False_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu2d_has_bias_False_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu2d_has_bias_False_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu2d_has_bias_True_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu2d_has_bias_True_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu2d_has_bias_True_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu2d_has_bias_True_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu2d_transposed_has_bias_False_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu2d_transposed_has_bias_False_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu2d_transposed_has_bias_False_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu2d_transposed_has_bias_False_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu2d_transposed_has_bias_True_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu2d_transposed_has_bias_True_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu2d_transposed_has_bias_True_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu2d_transposed_has_bias_True_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu3d_has_bias_False_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu3d_has_bias_False_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu3d_has_bias_False_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu3d_has_bias_False_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu3d_has_bias_True_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu3d_has_bias_True_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu3d_has_bias_True_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu3d_has_bias_True_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise1d_has_bias_False_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise1d_has_bias_False_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise1d_has_bias_False_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise1d_has_bias_False_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise1d_has_bias_True_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise1d_has_bias_True_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise1d_has_bias_True_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise1d_has_bias_True_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise2d_has_bias_False_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise2d_has_bias_False_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise2d_has_bias_False_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise2d_has_bias_False_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise2d_has_bias_True_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise2d_has_bias_True_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise2d_has_bias_True_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise2d_has_bias_True_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise3d_has_bias_False_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise3d_has_bias_False_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise3d_has_bias_False_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise3d_has_bias_False_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise3d_has_bias_True_strided_False_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise3d_has_bias_True_strided_False_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise3d_has_bias_True_strided_True_contiguous_False_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_backend_xpu_depthwise3d_has_bias_True_strided_True_contiguous_True_xpu nn/test_convolution_xpu.py::TestConvolutionNNDeviceTypeXPU::test_conv_double_backward_xpu_float64

There are kinds of issues on MTL:

____ TestModuleXPU.test_grad_nn_ConvTranspose1d_xpu_float64 ____ Traceback (most recent call last): File "/home/gta/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/test_modules.py", line 516, in test_grad self._test_gradients_helper(device, dtype, module_info, training, gradcheck) File "/home/gta/pytorch/third_party/torch-xpu-ops/test/xpu/test_modules_xpu.py", line 62, in _gradients_helper self.assertTrue(check(fn_to_gradcheck, flat_input, nondet_tol=gradcheck_nondet_tol)) File "/home/gta/miniforge3/envs/daisy_0830/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4751, in gradcheck return torch.autograd.gradcheck(fn, inputs, kwargs) File "/home/gta/miniforge3/envs/daisy_0830/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2052, in gradcheck return _gradcheck_helper(args) File "/home/gta/miniforge3/envs/daisy_0830/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 2081, in _gradcheck_helper _gradcheck_real_imag( File "/home/gta/miniforge3/envs/daisy_0830/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1491, in _gradcheck_real_imag gradcheck_fn( File "/home/gta/miniforge3/envs/daisy_0830/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1925, in _fast_gradcheck _check_analytical_numerical_equal( File "/home/gta/miniforge3/envs/daisy_0830/lib/python3.10/site-packages/torch/autograd/gradcheck.py", line 1854, in _check_analytical_numerical_equal raise GradcheckError( torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0, numerical:tensor(0.0484, device='xpu:0', dtype=torch.float64) analytical:tensor(-0.0020, device='xpu:0', dtype=torch.float64)

The above quantities relating the numerical and analytical jacobians are computed in fast mode. See: https://github.com/pytorch/pytorch/issues/53876 for more background about fast mode. Below, we recompute numerical and analytical jacobians in slow mode:

Numerical: tensor([[-0.2421, -0.1192, -0.1192, ..., 0.0000, 0.0000, 0.0000], [ 0.0000, -0.2384, -0.0596, ..., 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, -0.2384, ..., 0.0000, 0.0000, 0.0000], ..., [ 0.0000, 0.0000, 0.0000, ..., -0.0298, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, ..., -0.0596, -0.0298, 0.0000], [ 0.0000, 0.0000, 0.0000, ..., -0.1192, -0.0447, -0.0298]], device='xpu:0', dtype=torch.float64) Analytical: tensor([[-0.2432, -0.0507, -0.1240, ..., 0.0000, 0.0000, 0.0000], [ 0.0000, -0.2432, -0.0507, ..., 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, -0.2432, ..., 0.0000, 0.0000, 0.0000], ..., [ 0.0000, 0.0000, 0.0000, ..., -0.0284, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, ..., -0.0345, -0.0284, 0.0000], [ 0.0000, 0.0000, 0.0000, ..., -0.1270, -0.0345, -0.0284]], device='xpu:0', dtype=torch.float64)

The max per-element difference (slow mode) is: 0.18614743649959564.

=================================== FAILURES =================================== TestCommonXPU.test_numpy_ref_nn_functional_conv_transpose1d_xpu_complex128 Traceback (most recent call last): File "/home/gta/miniforge3/envs/daisy_0830/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1140, in test_wrapper return test(*args, kwargs) File "/home/gta/miniforge3/envs/daisy_0830/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1426, in only_fn return fn(self, *args, *kwargs) File "/home/gta/miniforge3/envs/daisy_0830/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2133, in wrapper fn(args, kwargs) File "/home/gta/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/test_ops.py", line 275, in test_numpy_ref self.compare_with_reference( File "/home/gta/miniforge3/envs/daisy_0830/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3744, in compare_with_reference self.assertEqual(actual, expected, exact_device=False, **kwargs) File "/home/gta/miniforge3/envs/daisy_0830/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3885, in assertEqual raise error_metas.pop()[0].to_error( AssertionError: Tensor-likes are not close!

Mismatched elements: 14 / 176 (8.0%) Greatest absolute difference: 1.1348890364718799e-05 at index (0, 1, 7) (up to 1e-07 allowed) Greatest relative difference: 4.11529848906818e-07 at index (0, 0, 7) (up to 1e-07 allowed)

_____ TestLinalgXPU.test_addmv_xpufloat16 ____ Traceback (most recent call last): File "/home/gta/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/test_linalg.py", line 5950, in test_addmv self._test_addmm_addmv(torch.addmv, t, m, v, beta=0) File "/home/gta/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/test_linalg.py", line 5909, in _test_addmm_addmv self.assertEqual(res1, res3) File "/home/gta/miniforge3/envs/daisy_0830/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3885, in assertEqual raise error_metas.pop()[0].to_error( AssertionError: Tensor-likes are not close!

Mismatched elements: 50 / 50 (100.0%) Greatest absolute difference: 0.0546875 at index (0,) (up to 0.001 allowed) Greatest relative difference: 0.0113983154296875 at index (0,) (up to 0.001 allowed)

__ TestLinalgXPU.test_hipblaslt_corner_cases_rocm_xpu_float16 __ Traceback (most recent call last): File "/home/gta/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/test_linalg.py", line 5116, in test_hipblaslt_corner_cases_rocm self.assertTrue(torch.allclose(out1_cpu, out1.cpu(), rtol=1e-2, atol=1e-2)) File "/home/gta/miniforge3/envs/daisy_0830/lib/python3.10/unittest/case.py", line 687, in assertTrue raise self.failureException(msg) AssertionError: False is not true

To execute this test, run the following from the base repo dir: PYTORCH_TEST_WITH_SLOW=1 python test/test_linalg.py TestLinalgXPU.test_hipblaslt_corner_cases_rocm_xpu_float16

_ TestLinalgXPU.test_matmul_45724_xpu __ Traceback (most recent call last): File "/home/gta/pytorch/third_party/torch-xpu-ops/test/xpu/test_linalg_xpu.py", line 69, in matmul_45724 self.assertEqual(c, cpu_result) File "/home/gta/miniforge3/envs/daisy_0830/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3885, in assertEqual raise error_metas.pop()[0].to_error( AssertionError: Tensor-likes are not close!

Mismatched elements: 7972735 / 31719908 (25.1%) Greatest absolute difference: 0.109375 at index (25957, 1, 9) (up to 1e-05 allowed) Greatest relative difference: 0.005401611328125 at index (51350, 14, 17) (up to 0.001 allowed)

_ TestConvolutionNNDeviceTypeXPU.test_Conv2d_deterministic_cudnn_xpucomplex64 Traceback (most recent call last): File "/home/gta/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/nn/test_convolution.py", line 1316, in test_Conv2d_deterministic_cudnn self.assertEqual( File "/home/gta/miniforge3/envs/daisy_0830/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3885, in assertEqual raise error_metas.pop()[0].to_error( AssertionError: Tensor-likes are not equal!

Mismatched elements: 1 / 3 (33.3%) Greatest absolute difference: 4.76837158203125e-07 at index (2,) Greatest relative difference: 9.8881798749062e-08 at index (2,)

To execute this test, run the following from the base repo dir: PYTORCH_TEST_WITH_SLOW=1 python test/nn/test_convolution.py TestConvolutionNNDeviceTypeXPU.test_Conv2d_deterministic_cudnn_xpu_complex64

____ TestLinalgXPU.test_addmm_baddbmm_overflow_xpufloat16 ____ Traceback (most recent call last): File "/home/gta/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/test_linalg.py", line 6109, in test_addmm_baddbmm_overflow self.assertTrue((out == 10000.).all()) File "/home/gta/miniforge3/envs/daisy_0830/lib/python3.10/unittest/case.py", line 687, in assertTrue raise self.failureException(msg) AssertionError: tensor(False, device='xpu:0') is not true

To execute this test, run the following from the base repo dir: PYTORCH_TEST_WITH_SLOW=1 python test/test_linalg.py TestLinalgXPU.test_addmm_baddbmm_overflow_xpu_float16

Versions

verified with oneDNN 3.5, 3.5.1, 3.5.3, all have the issue. Both MTL and ARC on 22.04

daisyden commented 1 week ago

ARC issue is because fp64 primitives are not implemented on ARC platform. See https://jira.devtools.intel.com/browse/MFDNN-12326.

chuanqi129 commented 1 week ago

Will submit new issue to oneDNN team for MTL