udacity / dlnd-issue-reports

5 stars 0 forks source link

First Neural Network not SGD anymore #279

Closed traveling-desi closed 7 years ago

traveling-desi commented 7 years ago

Hello!

There have been recent changes in the notebook for the first project: https://github.com/udacity/deep-learning/blob/master/first-neural-network/Your_first_neural_network.ipynb

In particular, the weight updates are:

      for X, y in zip(features, targets):
                        #### Implement the forward pass here ####

                        <snip>

                        # Weight step (input to hidden)
                        delta_weights_i_h += None
                        # Weight step (hidden to output)
                        delta_weights_h_o += None

        # TODO: Update the weights - Replace these values with your calculations.
        self.weights_hidden_to_output += None # update hidden-to-output weights with gradient descent step
        self.weights_input_to_hidden += None # update input-to-hidden weights with gradient descent step

The instructions still refer to this as:

You'll also be using a method know as Stochastic Gradient Descent (SGD) to train the network. The idea is that for each training pass, you grab a random sample of the data instead of using the whole data set. You use many more training passes than with normal gradient descent, but each pass is much faster. This ends up training the network more efficiently. You'll learn more about SGD later.

This is not SGD, but mini batching. in SGD we update the weights before we start with a new example. In this notebook, we accumulate the changes and then update the weights after the batch is completed. This is minibatching.

Refrence: https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Iterative_method

Please update the instructions in case you agree with this.

mcleonard commented 7 years ago

It's a mix of both SGD and mini-batches. With normal SGD, you'd randomly grab one input at a time and train the network on each input. Mini-batches take a batch of non-random data and train on that. Here we're using a mini-batch of random data, so it's still SGD just with multiple data points.