Based on my understanding, the weights should be copied to upper or bottom half in new gconv kernel interleavingly. Howerver, I found it is a sequential process in the optiztimized network. So I change void merge_gconv_kernel() in merge_gconv_kernel.cu as following
void merge_gconv_kernel(DATATYPE* dst_ptr,
const DATATYPE* src_ptr,
int volume,
int c_in_h_w,
int c_out,
int count)
{
assert(c_out % count == 0);
CUDA_KERNEL_LOOP(i, volume)
{
int mod = i % c_in_h_w;//kernel内位置
int div = i / c_in_h_w;//所属kernel
// int dst_i = div * c_in_h_w * count + div / (c_out / count) * c_in_h_w + mod;
int dst_i = div* c_in_h_w *count + div % count * c_in_h_w + mod;
dst_ptr[dst_i] = src_ptr[i];
}
}
Hi, All
Based on my understanding, the weights should be copied to upper or bottom half in new gconv kernel interleavingly. Howerver, I found it is a sequential process in the optiztimized network. So I change void merge_gconv_kernel() in merge_gconv_kernel.cu as following
Hoping it helps.