Selu Activation + High Level Tensor Operations

This PR adds support for the SELU activation https://github.com/thinktopic/cortex/issues/181

This implementation is a high level tensor based approach and introduces an attempt to make some tensor functions that wrap the unary and binary tensor functions and make them friendlier to use.

Note: This is very open to feedback and the implementation can be simply put in as a unary tensor operation :selu, with none of the other high level tensor function additions instead if desired

The rationale is that it would be nice to implement things like new activations in one place using tensor abstractions and have a clean syntax so that the logic of the activation is clear.

At first glance, I thought the selu activation which is:

(def SELU_ALPHA 1.6732632423543772848170429916717)
(def SELU_LAMBDA 1.0507009873554804934193349852946)

lambda*x for x > 0 and lambda * ((alpha * exp(x)) - alpha) for x <=0 and lambda for x > 0 and lambda * alpha exp(x) for x <= 0 for the gradient

would be pretty easy to implement. It would be straight forward as a new unary operation, but was more challenging to do it on a higher level tensor standpoint.

I thought it would be good to see what it would take to support it. These are the changes:

Added >,>=,<,<=,bit-xor to the base tensor binary ops (on the cpu and gpu side). I only really needed > but went ahead and put in rest while I was there for the future Side note: I got nvcc running by downgrading my Xcode on my mac
Added high level tensor wrappers for them in the cortex.tensor.operations namespace
Added high level tensor function where that acts like the tensor-flow where to handle the if branches and combine them with a masking operation.
Add the appropriate tests for everything.

with them the SELU activation looks like:

(def SELU_ALPHA 1.6732632423543772848170429916717)
(def SELU_LAMBDA 1.0507009873554804934193349852946)

(defn selu
  "lambda*x for x > 0 and lambda * ((alpha * exp(x)) - alpha) for x <=0"
  [input output]
  (where output
         (> (new-tensor input) input 0)
         ; lambda*x for x > 0
         (* (new-tensor input) input SELU_LAMBDA)
         ;  lambda * ((alpha * exp(x)) - alpha) for x <=0
         (-> (exp (new-tensor input) input)
             (* SELU_ALPHA)
             (- SELU_ALPHA)
             (* SELU_LAMBDA))))

I also tested it vs relu on the MNIST example network (without argumentation)

(defn initial-description
  [input-w input-h num-classes]
  [(layers/input input-w input-h 1 :id :data)
   (layers/convolutional 5 0 1 20)
   (layers/max-pooling 2 0 2)
   (layers/selu)
   (layers/convolutional 5 0 1 50)
   (layers/max-pooling 2 0 2)
   (layers/selu)
   (layers/linear 1000)
   (layers/dropout 0.4)
   (layers/linear num-classes)
   (layers/softmax :id :labels)])

Selu 100 epochs 0.977 Relu 100 epochs 0.979

From the paper, it seems like it would be more effective than RELU for a deeper network. Also it might be more effective with the Selu AlphaDropout implemented, which would be a future PR.

Again feedback is most welcome. This is just an approach that I thought would be interesting, but I'm not sure fits in with the your vision or understanding of other trade-offs.

originrose / cortex

Selu Activation + High Level Tensor Operations #247