I found that it seems exits a performance loss using different type as input type for kernel function. Define two kernel functions, accomplishing same thing:
function1
__kernel void global_bandwidth_vec2(const int heightA, const int widthA, __global const float *a, __global float *b) {
const int idx = get_global_id(0);
const int step = idx << 1;
CL_ELEM_TYPE value = *((__global CL_ELEM_TYPE *)(a + step));
*((__global CL_ELEM_TYPE *)(b + step)) = value;
}
function2
__kernel void global_bandwidth_vec2(const int heightA, const int widthA, __global const CL_ELEM_TYPE *a, __global CL_ELEM_TYPE *b) {
const int idx = get_global_id(0);
const int step = idx << 1;
CL_ELEM_TYPE value = *((__global CL_ELEM_TYPE *)(a + step));
*((__global CL_ELEM_TYPE *)(b + step)) = value;
}
These two functions are same except input-variable type: function1 using float; function2 using MACRO (defined in clBuildProgram) instead.
I found that it seems exits a performance loss using different type as input type for kernel function. Define two kernel functions, accomplishing same thing:
function1
function2
These two functions are same except input-variable type: function1 using
float
; function2 using MACRO (defined inclBuildProgram
) instead.