zishansami102 / CNN-from-Scratch

A scratch implementation of Convolutional Neural Network in Python using only numpy and validated over CIFAR-10 & MNIST Dataset
http://cnndigits.pythonanywhere.com/
GNU General Public License v3.0
271 stars 79 forks source link

Implement a different method for parameter initialization #1

Closed zishansami102 closed 6 years ago

zishansami102 commented 6 years ago

Description:

Currently Xavier Initialization is being used in convnet.py. Write a different function with any other initialization method.

deyachatterjee commented 6 years ago

hi im interested to work on this. can u give me specific pointers as to how please? i checked the google group but u said to comment on here

zishansami102 commented 6 years ago

Pointer This is the pointer to the current initialization method. If you want to add a new method, make sure that you train and test with MNIST dataset.

zishansami102 commented 6 years ago

If you want me to be specific about the initialisation method. Then you can try MSRA or He initialization or the method describe in this paper LSUV PAPER IN ICLR2016

zishansami102 commented 6 years ago

@deyachatterjee if you have started working on this issue or going to do so then please post about it in the google group so that others can see who is currently working on what!! There may me more than one contributor on this issue or any other.

the-ethan-hunt commented 6 years ago

can we use batch normalization as a different parameter initalization?

zishansami102 commented 6 years ago

@the-ethan-hunt
I don't understand. Batch Normalization is not a method to initialise parameters. It is used as a layer in deep learning to give input to next layers with zero mean and unit variance. If i get you wrong, then can you please elaborate?

deyachatterjee commented 6 years ago

I ll post in the google grp when i start working on it. My end sems are going on, so i might take a bit time to start..

AdityaSoni19031997 commented 6 years ago

onstant([val]) Initialize weights with constant value. Normal([std, mean]) Sample initial weights from the Gaussian distribution. Uniform([range, std, mean]) Sample initial weights from the uniform distribution. Glorot(initializer[, gain, c01b]) Glorot weight initialization. GlorotNormal([gain, c01b]) Glorot with weights sampled from the Normal distribution. GlorotUniform([gain, c01b]) Glorot with weights sampled from the Uniform distribution. He(initializer[, gain, c01b]) He weight initialization. HeNormal([gain, c01b]) He initializer with weights sampled from the Normal distribution. HeUniform([gain, c01b]) He initializer with weights sampled from the Uniform distribution. Orthogonal([gain]) Intialize weights as Orthogonal matrix. Sparse([sparsity, std]) Initialize weights as sparse matrix

AdityaSoni19031997 commented 6 years ago

For LSUV initilisation , we need to know the details of the arch beforehand, that means the arch will always be fixed?

AdityaSoni19031997 commented 6 years ago

Sir do we have channels first or channels last? Can't make it out from the code.. (it seems that its channel first)

zishansami102 commented 6 years ago

@AdityaSoni19031997 1 ) Yes, one of those methods. You do not need to copy paste all that. 2 ) No, you can make it variable if you want and then call the function with the current architecture 3 ) Yes, channels first.

ghost commented 6 years ago

Can we initialize by assigning random weights on the scale of -1 to +1

ghost commented 6 years ago

Or can we do a data dependent initialization

zishansami102 commented 6 years ago

@sudheer2910
Yes we can. Infact that's the 2nd most naive way to intialize the weights (1st is to initialize it with zero :p ). In case of complex and deep networks, we need good methods in which there is less chance of vanishing gradients during back-propagation.

That's why we have Xavier's Initialization method implemented over here.

ghost commented 6 years ago

what about data dependent initialization? Can we use that??

zishansami102 commented 6 years ago

Yes. Actually LSUV is based on that i think.

ghost commented 6 years ago

Actually I read this paper and in that they told about initializing the weights. I am not able to read the full paper to get more information about it Weight Initialization of Deep Neural Networks(DNNs) using Data Statistics

On Thu, Nov 30, 2017 at 10:21 PM, Zishan Sami notifications@github.com wrote:

Yes. Actually LSUV is based on that i think.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zishansami102/CNN-from-Scratch/issues/1#issuecomment-348249532, or mute the thread https://github.com/notifications/unsubscribe-auth/AgRe50QDoBXe8tsAXsqFbBxd4L4rMnhFks5s7tz5gaJpZM4Qa4u_ .

ghost commented 6 years ago

This is the link to that https://www.researchgate.net/publication/320727084_Weight_Initialization_of_Deep_Neural_NetworksDNNs_using_Data_Statistics

On Thu, Nov 30, 2017 at 10:37 PM, Sudheer Akkirala < sudheer.akkirala29@gmail.com> wrote:

Actually I read this paper and in that they told about initializing the weights. I am not able to read the full paper to get more information about it Weight Initialization of Deep Neural Networks(DNNs) using Data Statistics

On Thu, Nov 30, 2017 at 10:21 PM, Zishan Sami notifications@github.com wrote:

Yes. Actually LSUV is based on that i think.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zishansami102/CNN-from-Scratch/issues/1#issuecomment-348249532, or mute the thread https://github.com/notifications/unsubscribe-auth/AgRe50QDoBXe8tsAXsqFbBxd4L4rMnhFks5s7tz5gaJpZM4Qa4u_ .

zishansami102 commented 6 years ago

@sudheer2910 That looks cool to me. I am assigning that method to you. You implement that and i will merge the PR.

AdityaSoni19031997 commented 6 years ago

I am interested in implementing he one

On 30-Nov-2017 11:47 pm, "Zishan Sami" notifications@github.com wrote:

That looks cool to me. I am assigning that method to you. You implement that and i will merge the PR.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zishansami102/CNN-from-Scratch/issues/1#issuecomment-348274701, or mute the thread https://github.com/notifications/unsubscribe-auth/AVr0piRCvB7wd73Kmz1clknk2xU65VlTks5s7vFRgaJpZM4Qa4u_ .

zishansami102 commented 6 years ago

@AdityaSoni19031997 Okey, your wish. It would be better if you choose the LSUV. Depends on you.

AdityaSoni19031997 commented 6 years ago

Let's see which ever will be possible I will try to send a PR..

On 01-Dec-2017 12:09 am, "Zishan Sami" notifications@github.com wrote:

@AdityaSoni19031997 https://github.com/adityasoni19031997 Okey, your wish. It would be better if you choose the LSUV. Depends on you.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zishansami102/CNN-from-Scratch/issues/1#issuecomment-348280704, or mute the thread https://github.com/notifications/unsubscribe-auth/AVr0prEesFm2BzPNQ3ziV22nf27ePUanks5s7vZfgaJpZM4Qa4u_ .

AdityaSoni19031997 commented 6 years ago

I have a neural net implemented from scratch.. Add that also to this?(takes data from a CSV file)

zishansami102 commented 6 years ago

No. Never. Why? This is different. And i must remind you that you need to implement the initialization for the Conv Net and not for the simple neural net. So we don't need that.

AdityaSoni19031997 commented 6 years ago

@zishansami102

python3
import numpy as np
def initialise_param_lecun_normal(FILTER_SIZE, IMG_DEPTH, scale=1.0, distribution='normal'):

    if scale <= 0.:
            raise ValueError('`scale` must be a positive float. Got:', scale)

    distribution = distribution.lower()
    if distribution not in {'normal'}:
        raise ValueError('Invalid `distribution` argument: '
                             'expected one of {"normal", "uniform"} '
                             'but got', distribution)

    scale = scale
    distribution = distribution
    scale /= max(1., fan_in)
    stddev = np.sqrt(scale)
    shape = (IMG_DEPTH,FILTER_SIZE,FILTER_SIZE)
    return np.random.standard_normal(shape, 0., stddev)

calculation for fan_in is left as i am not able to relate the code properly

there is room to use this function as diff initialisers by playing with the scales initial's value...

AdityaSoni19031997 commented 6 years ago

@zishansami102 Sorry for late reply ... Actually my semester exams will begin next week so was busy with that.. Here's the link.. (It's just a number)

https://www.google.co.in/amp/s/www.researchgate.net/post/What_do_fan_in_and_fan_out_mean_in_deeplearningtoolbox_of_CNN/amp

AdityaSoni19031997 commented 6 years ago

After knowing that number,

the piece of code can be modified slightly to fit in various different init functions and it will serve as a parent function...

Saw it in Keras they do it that way..

AdityaSoni19031997 commented 6 years ago

Any more specific init functions that needs to be implemented?

AdityaSoni19031997 commented 6 years ago

Will run it today sir and let you know the results..

zishansami102 commented 6 years ago

Then in that case i think it should be: fan_in = l1 f f fan_out = l2 f f where there are l2 filters with shape (l1, f, f) I could be wrong, it's just what is written there.You do your research and then implement.

AdityaSoni19031997 commented 6 years ago

Ok

AdityaSoni19031997 commented 6 years ago

Things are broken sir in the function nanargmax I hope its not Python's Version issues @zishansami102

Traceback (most recent call last):
  File "C:\Users\Jai Shree Krishna\Downloads\CNN-from-Scratch-master\CNN-from-Scratch-master\MNIST\run.py", line 92, in <module>
    out = momentumGradDescent(batch, LEARNING_RATE, IMG_WIDTH, IMG_DEPTH, MU, filt1, filt2, bias1, bias2, theta3, bias3, cost, acc)
  File "C:\Users\Jai Shree Krishna\Downloads\CNN-from-Scratch-master\CNN-from-Scratch-master\MNIST\convnet.py", line 233, in momentumGradDescent
    [dfilt1_, dfilt2_, dbias1_, dbias2_, dtheta3_, dbias3_, curr_cost, acc_] = ConvNet(image, label, filt1, filt2, bias1, bias2, theta3, bias3)
  File "C:\Users\Jai Shree Krishna\Downloads\CNN-from-Scratch-master\CNN-from-Scratch-master\MNIST\convnet.py", line 122, in ConvNet
    (a,b) = nanargmax(conv2[jj,i:i+2,j:j+2]) ## Getting indexes of maximum value in the array
  File "C:\Users\Jai Shree Krishna\Downloads\CNN-from-Scratch-master\CNN-from-Scratch-master\MNIST\convnet.py", line 11, in nanargmax
    idx = np.argpartition(a, -nan_count-1, axis=None)
  File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 706, in argpartition
    return _wrapfunc(a, 'argpartition', kth, axis=axis, kind=kind, order=order)
  File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 57, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
ValueError: kth(=-1) out of bounds (4)
zishansami102 commented 6 years ago

@AdityaSoni19031997 Fixed now. Update your repo. It should work. I have removed Xavier initialization temporarily. And please don't call me Sir. I am also a student.

AdityaSoni19031997 commented 6 years ago

Its working and i have tested it

Will Update Xavier later tonight

AdityaSoni19031997 commented 6 years ago

What's with the KWoC tag?

zishansami102 commented 6 years ago

@AdityaSoni19031997 What about it??

zishansami102 commented 6 years ago

Lecun_Normal Done. @AdityaSoni19031997 Good Work. :)

AdityaSoni19031997 commented 6 years ago

Any other init?

AdityaSoni19031997 commented 6 years ago

@zishansami102 Why that Tag?

AdityaSoni19031997 commented 6 years ago

Well we can add lot of the other ones like he, glon etc If you are up for it..

zishansami102 commented 6 years ago

Try batch_Norm. Leave the init for others.

AdityaSoni19031997 commented 6 years ago

I wish i would have tried that also but having my end sems from 5th Dec

AdityaSoni19031997 commented 6 years ago

Do we have to submit details about our mentors even if we aren't from KGP?

zishansami102 commented 6 years ago

That's up to kwoc coordinators to decide. mail them.

AdityaSoni19031997 commented 6 years ago

Either ways... that doesn't matter... Happy Coding..

On 11-Dec-2017 19:36, "Zishan Sami" notifications@github.com wrote:

That's up to kwoc coordinators to decide. mail them.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zishansami102/CNN-from-Scratch/issues/1#issuecomment-350733467, or mute the thread https://github.com/notifications/unsubscribe-auth/AVr0psnXBrHgLLBYiU6sj-vt8zL9OxNJks5s_TbigaJpZM4Qa4u_ .

zishansami102 commented 6 years ago

@AdityaSoni19031997 Why you have not submitted for midterm evaluation?

AdityaSoni19031997 commented 6 years ago

I don't belong to Kgp

zishansami102 commented 6 years ago

I don't think that matters.

AdityaSoni19031997 commented 6 years ago

What’s the last date? I had tried but they denied my email-id ..

Sent from Mail for Windows 10

From: Zishan Sami Sent: Thursday, December 14, 2017 12:07 PM To: zishansami102/CNN-from-Scratch Cc: Aditya Soni; Mention Subject: Re: [zishansami102/CNN-from-Scratch] Implement a different method forparameter initialization (#1)

I don't think that matters. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.


This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus