huawei-noah / bolt

Bolt is a deep learning library with high performance and heterogeneous flexibility.
https://huawei-noah.github.io/bolt/
MIT License
918 stars 160 forks source link

STORE_OUTPUT存在typo,Adreno GPU计算反卷积f2s2出错 #69

Closed chillingche closed 3 years ago

chillingche commented 3 years ago
./test_deconvolution_ocl 24 256 128 1 2 2 2 0                                                        <

[DEBUG] thread 13883 OCLContext 0x61531c6278 constructor start
[DEBUG] thread 13883 try to dlopen libQUALCOMM_Adreno_660_map.so failed, dlopen failed: library "libQUALCOMM_Adreno_660_map.so" not found, create kernel from source code
[DEBUG] thread 13883 gcl_kernel_source 0xb40000714c3a1250 constructor
[DEBUG] thread 13883 OCLContext 0x61531c6278 constructor end
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_12 runInfo: ls <0 0 0> executeTime = 153.856000 us
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_22 runInfo: ls <0 0 0> executeTime = 130.816000 us
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_32 runInfo: ls <0 0 0> executeTime = 153.088000 us
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_42 runInfo: ls <0 0 0> executeTime = 122.880000 us
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_14 runInfo: ls <0 0 0> executeTime = 143.872000 us
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_24 runInfo: ls <0 0 0> executeTime = 102.144000 us
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_34 runInfo: ls <0 0 0> executeTime = 118.016000 us
[DEBUG] thread 13883 enqueue_fill_image runInfo: executeTime = 15.872000 us
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_trans_fltbuf_44 runInfo: executeTime = 5.888000 us
[DEBUG] thread 13883 DATATRANS>>> enqueue_write_buffer runInfo: executeTime = 129.024000 us
[DEBUG] thread 13883 KERNEL>>> unknow_mem_trans_om_nchw_to_nchwc4 runInfo: executeTime = 113.920000 us
[INFO] thread 13883 warm up gpu:
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_24 runInfo: ls <0 0 0> executeTime = 102.912000 us
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_24 runInfo: ls <0 0 0> executeTime = 100.864000 us
[DEBUG] thread 13883 KERNEL>>> unknow_deconv_gemm_f2s2_qc_iom_24 runInfo: ls <0 0 0> executeTime = 98.048000 us
[DEBUG] thread 13883 KERNEL>>> unknow_mem_trans_im_nchwc4_to_nchw runInfo: executeTime = 51.968000 us
[DEBUG] thread 13883 DATATRANS>>> enqueue_read_buffer runInfo: executeTime = 16.896000 us
[INFO] thread 13883 16bit,         Deonvolution,                                    (1 24 256 128)+(24 1 2 2)/(2 0)=(1 1 512 256),    TIME    0.098ms,        GFLOPS   65.504
abs(diff) >= 1.000000e+00f, number = 23
abs(diff) >= 1.000000e-01f, number = 822
abs(diff) >= 1.000000e-02f, number = 164
abs(diff) >= 1.000000e-03f, number = 1084
abs(diff) >= 1.000000e-04f, number = 85300
abs(diff) >= 1.000000e-05f, number = 3176
abs(diff) >= 0.000000e+00f, number = 40503
maxabs = 1.530273, a = 0.000000, b = 1.530273 @ 428
maxrel = 976.562500, a = -0.000244, b = 0.000244 @ 73386
[DEBUG] thread 13883 OCLContext 0x61531c6278 deconstructor start
[DEBUG] thread 13883 gcl_kernel_source 0xb40000714c3a1250 constructor
[DEBUG] thread 13883 OCLContext 0x61531c6278 deconstructor end
yunfanxiao commented 3 years ago

非常感谢您的反馈!会在近期修复。