Open tkphd opened 6 years ago
Inner loops on CUDA convolution code should run faster using a #pragma unroll statement.
#pragma unroll
#pragma unroll N
-Munroll
Inner loops on CUDA convolution code should run faster using a
#pragma unroll
statement.#pragma unroll N
in CUDA#pragma unroll N
in OpenCL-Munroll
flag with pgcc#pragma unroll N
for icc]()