Open nwnk opened 2 weeks ago
For additional data, with the OMP and OCL backends, the same baseline tests fail without the NONE
setting; with it set, the OCL backend seems to be in better shape than SYCL:
76% tests passed, 78 tests failed out of 322
Total Test time (real) = 5526.59 sec
The following tests FAILED:
7 - test_binary_gpu (Failed)
8 - test_binary_buffer_gpu (Failed)
10 - test_concat_gpu (Failed)
11 - test_concat_buffer_gpu (Failed)
13 - test_concurrency_gpu (Failed)
14 - test_concurrency_buffer_gpu (Failed)
16 - test_convolution_backward_data_f32_gpu (Failed)
17 - test_convolution_backward_data_f32_buffer_gpu (Failed)
19 - test_convolution_backward_weights_f32_gpu (Failed)
20 - test_convolution_backward_weights_f32_buffer_gpu (Failed)
22 - test_convolution_eltwise_forward_f32_gpu (Failed)
23 - test_convolution_eltwise_forward_f32_buffer_gpu (Failed)
25 - test_convolution_eltwise_forward_x8s8f32s32_gpu (Failed)
26 - test_convolution_eltwise_forward_x8s8f32s32_buffer_gpu (Failed)
28 - test_convolution_forward_f32_gpu (Failed)
29 - test_convolution_forward_f32_buffer_gpu (Failed)
36 - test_cross_engine_reorder (Failed)
37 - test_cross_engine_reorder_buffer (Failed)
39 - test_deconvolution_gpu (Failed)
40 - test_deconvolution_buffer_gpu (Failed)
78 - test_inner_product_backward_data_gpu (Failed)
79 - test_inner_product_backward_data_buffer_gpu (Failed)
81 - test_inner_product_backward_weights_gpu (Failed)
82 - test_inner_product_backward_weights_buffer_gpu (Failed)
84 - test_inner_product_forward_gpu (Failed)
85 - test_inner_product_forward_buffer_gpu (Failed)
93 - test_matmul_gpu (Failed)
94 - test_matmul_buffer_gpu (Failed)
96 - test_persistent_cache_api_gpu (Failed)
97 - test_persistent_cache_api_buffer_gpu (Failed)
102 - test_pooling_forward_gpu (Failed)
103 - test_pooling_forward_buffer_gpu (Failed)
108 - test_primitive_cache_mt_gpu (Subprocess aborted)
109 - test_primitive_cache_mt_buffer_gpu (Subprocess aborted)
114 - test_reorder_gpu (Failed)
115 - test_reorder_buffer_gpu (Failed)
123 - test_shuffle_gpu (Failed)
124 - test_shuffle_buffer_gpu (Failed)
129 - test_sum_gpu (Failed)
130 - test_sum_buffer_gpu (Failed)
170 - test_api (Failed)
188 - test_graph_c_api_compile_usm_gpu (Failed)
190 - test_graph_c_api_compile_parametrized_usm_gpu (Failed)
192 - test_graph_cpp_api_compile_usm_gpu (Failed)
194 - test_graph_cpp_api_partition_usm_gpu (Failed)
196 - test_graph_cpp_api_compiled_partition_ocl_gpu (Failed)
221 - test_graph_unit_dnnl_batch_norm_usm_gpu (Failed)
223 - test_graph_unit_dnnl_binary_op_usm_gpu (Failed)
225 - test_graph_unit_dnnl_bmm_usm_gpu (Failed)
227 - test_graph_unit_dnnl_compiled_partition_usm_gpu (Failed)
229 - test_graph_unit_dnnl_concat_usm_gpu (Failed)
231 - test_graph_unit_dnnl_conv_usm_gpu (Failed)
233 - test_graph_unit_dnnl_convtranspose_usm_gpu (Failed)
235 - test_graph_unit_dnnl_dequantize_usm_gpu (Failed)
237 - test_graph_unit_dnnl_eltwise_usm_gpu (Failed)
241 - test_graph_unit_dnnl_large_partition_usm_gpu (Failed)
245 - test_graph_unit_dnnl_matmul_usm_gpu (Failed)
249 - test_graph_unit_dnnl_pool_usm_gpu (Failed)
253 - test_graph_unit_dnnl_quantize_usm_gpu (Failed)
255 - test_graph_unit_dnnl_reduce_usm_gpu (Failed)
257 - test_graph_unit_dnnl_reorder_usm_gpu (Failed)
261 - test_graph_unit_dnnl_softmax_usm_gpu (Failed)
274 - test_benchdnn_modeC_concat_ci_gpu (Failed)
276 - test_benchdnn_modeC_conv_gpu_ci_gpu (Failed)
278 - test_benchdnn_modeC_deconv_ci_gpu (Failed)
280 - test_benchdnn_modeC_eltwise_ci_gpu (Failed)
284 - test_benchdnn_modeC_graph_ci_gpu (Subprocess aborted)
286 - test_benchdnn_modeC_ip_ci_gpu (Failed)
292 - test_benchdnn_modeC_matmul_ci_gpu (Failed)
294 - test_benchdnn_modeC_pool_ci_gpu (Failed)
300 - test_benchdnn_modeC_reorder_ci_gpu (Failed)
305 - test_benchdnn_modeC_gru_ci_gpu (SEGFAULT)
306 - test_benchdnn_modeC_lstm_ci_gpu (SEGFAULT)
307 - test_benchdnn_modeC_rnn_ci_gpu (SEGFAULT)
312 - test_benchdnn_modeC_self_ci_gpu (Failed)
314 - test_benchdnn_modeC_shuffle_ci_gpu (Failed)
316 - test_benchdnn_modeC_softmax_ci_gpu (Failed)
318 - test_benchdnn_modeC_sum_ci_gpu (Failed)
88 GPU tests passed, so again, more working than not, but still not really working.
Intel(R) UHD Graphics 630 support was discontinued and the last driver update published in the end of 2022. oneDNN dropped support for GEN9 in v3.4 release. Looks like we neglected to drop GEN9 from the ISA list though.
Trying your patch on newer architecture (Xe-HPC) I see 'could not create a primitive' errors for some tests. This looks like empty ISA list results in issues with platform detection and/or kernel dispatching. If you want to make DNNL_ENABLE_PRIMITIVE_GPU_ISA=NONE
work likely additional implementation changes would be needed.
@nwnk,
The build documentation claims that generic OpenCL kernels are always available.
The documentation doesn't claim that, it says that ONEDNN_ENABLE_PRIMITIVE_GPU_ISA
knob controls the just-in-time kernel generation based implementations and that the OpenCL based kernels and implementations are always available. It doesn't imply that the OpenCL kernels are generic even though some of them may be.
If there is a need to introduce generic OpenCL kernels then I believe that best way to do that would be via introducing a generic GPU vendor (ONEDNN_GPU_VENDOR=GENERIC
). We have a plan to do that for SYCL GPU runtime.
The ONEDNN_ENABLE_PRIMITIVE_GPU_ISA
knob should be used to control implementations within a particular vendor if there is such a need.
The build documentation claims that generic OpenCL kernels are always available. I wanted to verify that they worked, and the straightforward way to do that seemed to be this:
And that builds! And it works more than it doesn't! With the Intel oneAPI 2024.1 DPC++ compiler, I built
3c0e1f1635c81ae9074f2deeff9977a2a8ef149d
with the above patch, SYCL CPU and GPU backends. (I am not using the OpenCL driver from the oneAPI release. I am using Fedora 40's build of the Intel Compute Runtime,intel-compute-runtime-24.09.28717.17-1.fc40.x86_64
. I don't expect that matters much here, but I can try with a different version if it helps.)With the normal build, ctest says:
Then, I rebuilt with
DNNL_ENABLE_PRIMITIVE_GPU_ISA
set toNONE
, and ctest said:So 93 new failures. 107 GPU tests did pass, though, so it seems like this should work. This is on a gen9 GPU, specifically:
Since GEN9 is the lowest ISA specifically supported this suggests that some of the generic OpenCL kernels are broken.