clnn

OpenCL backend for Torch nn neural networks library.

Installation

Please see distro-cl for installation instructions.

What works

Parameterized Modules

nn.Linear

Basic Tensor methods

These mostly 'just work', since based on underlying tensor methods, already implemented in cltorch. Tested with:

nn.Narrow

Miscellaneous modules

nn.Identity
nn.Dropout

Convolution layers

nn.SpatialConvolutionMM
nn.SpatialMaxPooling (including ceil mode)
nn.SpatialAveragePooling
nn.TemporalConvolution2 This is specific to clnn. It works on cpu and cuda too, not just on OpenCL. It is API-compatible with TemporalConvolution, and faster than TemporalConvolution, on both CUDA and OpenCL.

Transfer function layers

nn.Tanh
nn.Sigmoid
nn.ReLU
nn.ELU
nn.Exp
nn.Sqrt
nn.Square
nn.Abs
nn.LogSigmoid
nn.HardTanh
nn.LogSoftMax
nn.SoftMax (including spatial mode)

Table layers

These 'just work', since they are based on underlying torch operations, which are already implemented in cltorch. Tested with:

nn.CMulTable
nn.CAddTable

Criterions

nn.MSECriterion
nn.ClassNLLCriterion

Containers:

Containers 'just work', since they just call standard operations on the contained modules. Tested with:

nn.Sequential
nngraph

Trainers

In theory, trainers 'just work', since they just call standard torch methods on the network. The following are good first choices:

nn.StochasticGradient
optim.lbfgs
optim.adam

Timings

Soumith benchmark layers

Please see https://github.com/soumith/convnet-benchmarks#imagenet-winners-benchmarking

On a Titan X, OpenCL torch is about 3 times slower than CUDA torch
- eg for VGG, cutorch takes 1100ms, and cltorch takes 3400ms

Example networks

Andrej's char-rnn is OpenCL-enabled, simple add option -opencl 1
Justin's neural-style has an OpenCL port in progress by Shubhanshu napsternxg/neural-style

Porting guidelines

Porting guidelines, for project maintainers, available here: porting-guidelines.md.

Recent changes

2nd May:
- Re-applied:
- 26th March:
  - add TemporalConvolution2: same API and usage as TemporalConvolution, but faster on GPUs
31st April:
- Re-applied:
- 10th March:
  - @pawni (Nick Pawlowski) added SpatialUpSamplingNearest. Thank you Nick
- 20th February:
  - @gloine (Jaehyung Lee) added support for non-batched input to ClassNLLCriterion. Thank you Jaehyung
30th April:
- rolled back to as-of 21st February, prior to lots of THNN changes in upstream Torch
- additionally, installation procedure is now to use a specific torch distro, for stability
1st Feb:
- merged/ported THNN phase 3. Any weird build issues, please update both nn and clnn.
2nd January, 2016:
- merged/ported THNN architecture across, and the implementation of Abs, so the unit-tests pass again now
15th December:
- merged Sergey's SpatialAveragePadding and ceil kernels into master branch
29th November:
- added ELU
25th September:
- ported Sergey's not-yet-merged SpatialAveragePadding and ceil kernels, into clnn-avgpool branch
- ported latest version of SoftMax, ie essentially Jonghoon's Update SoftMax to work in spatial mode
23rd September:
- ported latest cunn implementation of SpatialMaxPooling across, ie approximately Sergey's Deterministic max-pooling PR
- this includes :ceil() implementation
22nd September:
- added non-batch implementation of LogSoftMax (previously only handled batched input)
- added SoftMax, for both batched and non-batched
20th September:
- added non-batch implementation for SpatialMaxPooling (previously only handled batched input), for contiguous pools

Older changes

hughperkins / clnn

readme