The kernel size can not be larger than 13x13? I set the kernel size to 15x15 and encounter the error "CUDA error: an illegal memory access was encountered". But everything is ok when the kernel size is smaller than 13x13 or equals to it. Besides, I note that the speed is obviously lower as the kernel size goes up, more obvious than common depthwise convolutions. Is that because the implementation is not enough efficient?
The kernel size can not be larger than 13x13? I set the kernel size to 15x15 and encounter the error "CUDA error: an illegal memory access was encountered". But everything is ok when the kernel size is smaller than 13x13 or equals to it. Besides, I note that the speed is obviously lower as the kernel size goes up, more obvious than common depthwise convolutions. Is that because the implementation is not enough efficient?