This implementation is a high level tensor based approach and introduces an attempt to make some tensor functions that wrap the unary and binary tensor functions and make them friendlier to use.
Note: This is very open to feedback and the implementation can be simply put in as a unary tensor operation :selu, with none of the other high level tensor function additions instead if desired
The rationale is that it would be nice to implement things like new activations in one place using tensor abstractions and have a clean syntax so that the logic of the activation is clear.
At first glance, I thought the selu activation which is:
lambda*x for x > 0 and lambda * ((alpha * exp(x)) - alpha) for x <=0
and
lambda for x > 0 and lambda * alpha exp(x) for x <= 0 for the gradient
would be pretty easy to implement. It would be straight forward as a new unary operation, but was more challenging to do it on a higher level tensor standpoint.
I thought it would be good to see what it would take to support it.
These are the changes:
Added >,>=,<,<=,bit-xor to the base tensor binary ops (on the cpu and gpu side). I only really needed > but went ahead and put in rest while I was there for the future Side note: I got nvcc running by downgrading my Xcode on my mac
Added high level tensor wrappers for them in the cortex.tensor.operations namespace
Added high level tensor function where that acts like the tensor-flow where to handle the if branches and combine them with a masking operation.
Add the appropriate tests for everything.
with them the SELU activation looks like:
(def SELU_ALPHA 1.6732632423543772848170429916717)
(def SELU_LAMBDA 1.0507009873554804934193349852946)
(defn selu
"lambda*x for x > 0 and lambda * ((alpha * exp(x)) - alpha) for x <=0"
[input output]
(where output
(> (new-tensor input) input 0)
; lambda*x for x > 0
(* (new-tensor input) input SELU_LAMBDA)
; lambda * ((alpha * exp(x)) - alpha) for x <=0
(-> (exp (new-tensor input) input)
(* SELU_ALPHA)
(- SELU_ALPHA)
(* SELU_LAMBDA))))
I also tested it vs relu on the MNIST example network (without argumentation)
From the paper, it seems like it would be more effective than RELU for a deeper network. Also it might be more effective with the Selu AlphaDropout implemented, which would be a future PR.
Again feedback is most welcome. This is just an approach that I thought would be interesting, but I'm not sure fits in with the your vision or understanding of other trade-offs.
This PR adds support for the SELU activation https://github.com/thinktopic/cortex/issues/181
This implementation is a high level tensor based approach and introduces an attempt to make some tensor functions that wrap the unary and binary tensor functions and make them friendlier to use.
Note: This is very open to feedback and the implementation can be simply put in as a unary tensor operation :selu, with none of the other high level tensor function additions instead if desired
The rationale is that it would be nice to implement things like new activations in one place using tensor abstractions and have a clean syntax so that the logic of the activation is clear.
At first glance, I thought the selu activation which is:
lambda*x for x > 0 and lambda * ((alpha * exp(x)) - alpha) for x <=0
andlambda for x > 0 and lambda * alpha exp(x) for x <= 0
for the gradientwould be pretty easy to implement. It would be straight forward as a new unary operation, but was more challenging to do it on a higher level tensor standpoint.
I thought it would be good to see what it would take to support it. These are the changes:
>,>=,<,<=,bit-xor
to the base tensor binary ops (on the cpu and gpu side). I only really needed>
but went ahead and put in rest while I was there for the future Side note: I got nvcc running by downgrading my Xcode on my maccortex.tensor.operations
namespacewhere
that acts like the tensor-flow where to handle theif
branches and combine them with a masking operation.with them the SELU activation looks like:
I also tested it vs relu on the MNIST example network (without argumentation)
Selu 100 epochs 0.977 Relu 100 epochs 0.979
From the paper, it seems like it would be more effective than RELU for a deeper network. Also it might be more effective with the Selu
AlphaDropout
implemented, which would be a future PR.Again feedback is most welcome. This is just an approach that I thought would be interesting, but I'm not sure fits in with the your vision or understanding of other trade-offs.