pluskid / Mocha.jl

Deep Learning framework for Julia
Other
1.29k stars 254 forks source link

Fix for #117, add Neurons.Exponential() #118

Closed benmoran closed 9 years ago

benmoran commented 9 years ago

This corrects the GaussianKLLossLayer problem that @jeff-regier spotted. That in turn means the MNIST VAE example can no longer safely use ReLU activations (because they can produce zero outputs that now cause log(0) in GaussianKLLossLayer) so I have also added Neurons.Exponential() which is just the exp(x) activation function.

jeff-regier commented 9 years ago

Hi Ben, I was using exponential neurons in my fork of Mocha.jl, but exp(x) grows really fast---exponentially fast even.... With 32-bit floats, for inputs to the neuron as small as 89, exp(x) returns Inf32. Once that happens, for even one input, at any iteration, Inf32 and NaN32 get propagated everywhere.

I've started using 2 new kinds of strictly positive neurons instead, for layers with outputs that represent standard deviations. The first, called EpsReLU, is like ReLU, except it returns epsilon = 1e-4 when ReLU would return something less than that. The second type of neuron I added, called ExpReLU, returns 1 + x for x > 0 and exp(x) for x <= 0.

If you run into the same problem with exponential neurons, feel free to copy my code for ExpReLU and EpsReLu from https://github.com/jeff-regier/Mocha.jl/blob/master/src/neurons.jl

Or if there's a better way, please let me know!

benmoran commented 9 years ago

Thanks, I did think about that when I noticed your other neuron types.  I experimented with adding an epsilon to ReLU as well, and  I also thought of a SoftReLU neuron, log(1+exp(x)).  But I was a bit worried about adding a lot of special case activation functions to the core of Mocha.

In the end I went with the Exp neuron because it seemed to work for my example, it's very simple, and it matches that used in the original paper.

I have been wondering in general within Mocha (also to @pluskid and others):

I suppose it's early days and at this point there are still a lot of standard modules that could be usefully added to the library before we worry too much bloating.  Especially the neurons don't add too much complexity.

On Thu, Sep 17, 2015, at 11:55 PM, Jeffrey Regier wrote:

Hi Ben, I was using exponential neurons in my fork of Mocha.jl, but exp(x) grows really fast---exponentially fast even.... With 32-bit floats, for inputs to the neuron as small as 89, exp(x) returns Inf32. Once that happens, for even one input, at any iteration, Inf32 and NaN32 get propagated everywhere.

I've started using 2 new kinds of strictly positive neurons instead, for layers with outputs that represent standard deviations. The first, called EpsReLU, is like ReLU, except it returns epsilon = 1e-4 when ReLU would return something less than that. The second type of neuron I added, called ExpReLU, returns 1 + x for x > 0 and exp(x) for x <= 0.

If you run into the same problem with exponential neurons, feel free to copy my code for ExpReLU and EpsReLu from https://github.com/jeff-regier/Mocha.jl/blob/master/src/neurons.jl

Or if there's a better way, please let me know!

— Reply to this email directly or view it on GitHub[1].

  Ben Moran

Links:

  1. https://github.com/pluskid/Mocha.jl/pull/118#issuecomment-141263182
pluskid commented 9 years ago

I'm fine adding parameters to neurons or adding more types and so on, but I think it is better to include in the core package only the well established components (e.g. neurons that are common).

Concerning adding neuron and layer types outside the package, that is actually a goal. If you find it difficult to do so due to some types not exported, etc. We should be able to modify Mocha to make it easier!

jeff-regier commented 9 years ago

I really like the syntax of Neurons.ReLU(epsilon=1e-5)...nice way to specify a strictly positive neuron. Thank you both for all your work on Mocha.