Introducing mklnn with CPU performance optimization

Hi, our team is focusing on torch/pytorch performance optimization on Intel Platform, Xeon and Xeon Phi. We provide packages of mklnn and mkltorch, similar to cudnn/cutorch. Distro repo for install torch with mklnn/mkltorch by default is available at intel-torch

The usage of mklnn is very easy, simply add: require 'mklnn' model=mklnn.convert(model, 'mkl')

Overall performance of mklnn is much faster than nn on CPU.

The following features are being developed on torch:

tensor operations optimization with OpenMP
tensor operations optimization using AVX512 instruction set, targeting for latest generation of Xeon (Skylake) and Xeon Phi (Knights Mill)
providing fused RNN/LSTM implementation for CPU
multi node NMT

Counterpart optimization for PyTorch is also WIP.

Any feedback from your side is highly valuable for us!

torch / distro

Introducing mklnn with CPU performance optimization #235