yacineMahdid / artificial-intelligence-and-machine-learning

A repository for implementation of artificial intelligence algorithm which includes machine learning and deep learning algorithm as well as classical AI search algorithm
165 stars 91 forks source link

Refactor the deep-learning-from-scratch to live in python files #37

Open yacineMahdid opened 3 years ago

yacineMahdid commented 3 years ago

Currently most of the code lives in Jupyter notebook, I should move most of the code into .py script so that I can reuse the code.

yacineMahdid commented 3 years ago

Will need to figure out how to structure the optimizer and take a step, the way my functions for optimization work right now might not be optimal.

After looking at an example of how pytorch works it seems that the way I structured it might work. I just need to have the gradient per weight and I'll be good to go.

In a nutshell this is what we will be doing

    for param in model.parameters():
        param -= learning_rate * param.grad

But we can wrap this around in a class like format as so:

learning_rate = 1e-3
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)
[...]
    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable
    # weights of the model). This is because by default, gradients are
    # accumulated in buffers( i.e, not overwritten) whenever .backward()
    # is called. Checkout docs of torch.autograd.backward for more details.
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model
    # parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()

This means that the optimizer will have access to the model parameters as well as the gradients. The one thing that is weird in Pytorch is that the loss as access to the model parameters.

I'll simplify this right now since I still don't have a dynamic graph solver implemented!

yacineMahdid commented 3 years ago

What I should have is something like this:

optimizer = SGD(model.parameters(), optimizer_parameters...)
[...]
optimizer.zero_grad() # this will remove all the gradients accumulated
optimizer.backward() # since it already has access to the graph and to the gradients.
optimizer.step() # do one gradient descent step
yacineMahdid commented 3 years ago

Little correction, we shouldn't have the optimizer doing the backward pass since this will only depends on the model and not on the optimizer!

We should be doing this instead:

optimizer = SGD(model.parameters(), optimizer_parameters...)
[...]
optimizer.zero_grad() # this will remove all the gradients accumulated
model.backward() # the behavior of backward will be architecture specific
optimizer.step() # do one gradient descent step
yacineMahdid commented 3 years ago

We should do a full run with the optimizer + activation + framework otherwise I'm running a bit blindly if I try to code up all the optimizer first.