Implement "Memory Efficient Convolution" paper

mratsim commented 6 years ago

Memory Efficient Convolution (MEC) is an alternative to im2col to lower the tensor and convolution kernel to feed them to BLAS for Matrix-Multiplication.

Compared to FFT and Winograd it uses much less memory and is faster than Winograd on 5 out of 7 benchmarks due to launching several GEMM in parallel and maximizing throughput.

It is especially suitable for mobile devices that are constrained in memory.

Note: It requires NHWC layout (Tensorflow)

Paper: Cho et al, [1706.06873]

Related implementation in lua (before CuDNN existed): CPU and GPU

yuhnjin commented 6 years ago

hello i am novice on cuda did you implement MEC? if you did can i get the code that i can refer? thanks

mratsim commented 6 years ago

Hello yuhnjin and welcome here. I didn't have the time unfortunately.

You're welcome

mratsim / Arraymancer

Implement "Memory Efficient Convolution" paper #131