mratsim / Arraymancer

A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
https://mratsim.github.io/Arraymancer/
Apache License 2.0
1.33k stars 95 forks source link

Implement "Memory Efficient Convolution" paper #131

Open mratsim opened 6 years ago

mratsim commented 6 years ago

Memory Efficient Convolution (MEC) is an alternative to im2col to lower the tensor and convolution kernel to feed them to BLAS for Matrix-Multiplication.

Compared to FFT and Winograd it uses much less memory and is faster than Winograd on 5 out of 7 benchmarks due to launching several GEMM in parallel and maximizing throughput.

It is especially suitable for mobile devices that are constrained in memory.

Note: It requires NHWC layout (Tensorflow)

Paper: Cho et al, [1706.06873]

Related implementation in lua (before CuDNN existed): CPU and GPU

yuhnjin commented 6 years ago

hello i am novice on cuda did you implement MEC? if you did can i get the code that i can refer? thanks

mratsim commented 6 years ago

Hello yuhnjin and welcome here. I didn't have the time unfortunately.

You're welcome