Implement new initializations for neural nets as per MSR paper

shogun-toolbox / shogun

Shōgun

http://shogun-toolbox.org

BSD 3-Clause "New" or "Revised" License

3.03k stars 1.04k forks source link

Implement new initializations for neural nets as per MSR paper #2700

Open lisitsyn opened 9 years ago

lisitsyn commented 9 years ago

The paper is here

Good for Deep learning applicants

sanuj commented 9 years ago

@lisitsyn I had a look at the paper. It's about parametric rectified linear units and talks about initialization of weights of rectifiers, training, testing and then compares it with normal ReLU, GoogleNet etc on ImageNet. Of what I understood, I have to implement the initializations (of weights of rectifier) mentioned in the paper for shogun. Can you tell me if we have some sort of similar implementation in neural nets so i can have a look at it and produce something similar? I might take some time to finish this as I have mid-terms from next week :P

lisitsyn commented 9 years ago

Yeah for example that's the initialization of linear layer we now have: https://github.com/shogun-toolbox/shogun/blob/develop/src/shogun/neuralnets/NeuralLinearLayer.cpp#L63

lisitsyn commented 9 years ago

@sanuj don't bother with initialization/learning of parameteric rectifier, it is more important to get initial weights of rectifiers

sanuj commented 9 years ago

@lisitsyn Sorry for the delay. The paper uses zero-mean Gaussian distribution with variance 2/n where n denotes the number of connections of a response y = Wx + b for layer L and initializes b=0 (vector of biases). The derivation is for a conv layer.

lisitsyn commented 9 years ago

@sanuj yeah feel free to implement that then ;)

sanuj commented 9 years ago

Sure :) Shall I add it in the convolutional layer or somewhere else and shall I edit the initialize_parameters function or add a new one?

lisitsyn commented 9 years ago

@sanuj I'd suggest you to add an other initialization mode that is switched with some enum

sanuj commented 9 years ago

@lisitsyn in conv layer?

sanuj commented 9 years ago

@lisitsyn is it good to merge or anything else needs to be done here?

arasuarun commented 8 years ago

@lisitsyn In the same paper, they described PReLu's - Parameterised Rectified Linear Units, which are a general form of ReLu's and are more effective than Leaky ReLu's. Are you guys planning on getting them into Shogun? If so, I'd like to have a go. Thanks. :)

karlnapf commented 8 years ago

That is good stuff, I heard about these.

BUT given the current state of nn implementations in shogun it would be better to tune them for speed and run some comparisons with other libs. Reading over the code there is tons of potential speed ups

arasuarun commented 8 years ago

Oh... okay, cool. I'll start looking at optimisation stuff like rmsprop, then. Thanks.

karlnapf commented 8 years ago

One thing to do would be to try to use the linalg library for some of the operations dones within the code. Convolution is an example. There might be others. Then using some simple parallelism (openmp) might also help. Then, thinking which operations could go on GPU?