jorgenkg / python-neural-network

This is an efficient implementation of a fully connected neural network in NumPy. The network can be trained by a variety of learning algorithms: backpropagation, resilient backpropagation and scaled conjugate gradient learning. The network has been developed with PYPY in mind.
BSD 2-Clause "Simplified" License
297 stars 98 forks source link

There are so many problems #2

Closed shenyann closed 8 years ago

shenyann commented 8 years ago

1.are u sure u are batch training? 2.look at at ur backpropagation,ur inputs actually are the outputs after every activations.but u regard them as the inputs again.it's wrong. 3.ur delta actually is dL/dx,but if u use L2 loss,the delta is y(1-y)*(t-y).if u use softmax as output,then the delta is out-targets.

jorgenkg commented 8 years ago

Hi @ys1045097987,

  1. Yes, this is batch training implemented through matrix operations. You can verify the validity of this statement by observing the value of the batch variable.
  2. If you pay close attention to the generator in code line 122, you can see that we are skewing the recorded inputs by one. Therefore, the first value processed from the list is actually the input signal to the last layers of the network, as it should be. It is difficult to spot, but enables an efficient implementation.

3. Delta

The error of the an output neuron j in layer l is the partial derviative of the cost function with respect to the input signal to the neuron. The derivate of the quadratic cost function is -(target - output) in the output layer. This follows the definition of quadratic cost function: 0.5 * SUM(target_j - output_j)^2 and the definition of the error of the output layer: delta_j := ∂Cost/∂output * activation_prime(output). Therefore, by using the quadratic cost function, the delta should be -(target - output), as it is implemented.

Please follow up, if you feel something has been overlooked

shenyann commented 8 years ago

1.yes,batch variable has changed,but every time u send whole of the data into net rather than a batch. that is equal to u frop epochs_len(data)/batch_size times,every time frop a epoch.when u frop a batch,after a epoch is over u should average the loss. 2.yes codeline 122 actually is the input signal:tanh(wx+b) into the last layer,then sigmoid(tanh(wx+b)),then caculate l2 loss.right? but sigmoid activations dont has parameter.. so:gradient_w=dCost/doutput_doutput/dinput_dinput/dW=-(target-output)_activation(sigmoid,derivative=True)inputs 3.look at ur update function,ur returned list include:inputx, tanh(wx+b),sigmoid(tanh(wx+b)),so when u backpropagate,the first inputs will be tanh(wx+b),but u also use this function:activation(inputs,derivative=true),actually this inputs tanh(wx+b)has already calculated,u dont need to calculate tanh(x) in ur activations function again,u just need to 1-y*2=dy/dx.

jorgenkg commented 8 years ago

Thank you very much for the feedback. I will look into it the first thing in the morning, and pass some kudos your way on the next commit:)

shenyann commented 8 years ago

Hello,Jorgen Grimnes.I think the new code is working well.But i think there will be a better way writing the code to :1.realize the derivatives of softmax 2.realize multiple criterion 3.realize batch frop 4.realize different hiddens for multiple layer. Then maybe we can try some ways to realize frop and brop in convnets. Can i make some contribution to improve the code?

2016-01-18 19:47 GMT+01:00 Jørgen Grimnes notifications@github.com:

Thank you very much for the feedback. I will look into it the first thing in the morning, and pass some kudos your way on the next commit:)

— Reply to this email directly or view it on GitHub https://github.com/jorgenkg/python-neural-network/issues/2#issuecomment-172620775 .

jorgenkg commented 8 years ago

Hi again ys!

I really appreciated the feedback, and it made me delve into the calculus again! You are of course welcome to contribute, I appreciate your interest :+1:

  1. As currently implemented, the network does not use Softmax for the output layer. Rather, the user can choose which activation / squashing function to apply. Moreover, softmax hasn't even been included in the library of activation functions.
  2. I'm not sure what you mean by your second point.
  3. This is certainly doable!
  4. This is of course easily doable :+1:

Exploring convnets sound compelling, but I'm personally shifting my focus towards spiking neural networks. Specifically the Continuous Time Recurrent Neural Network (CTRNN) formulated by Randall Beer.

Nevertheless, feel free to fork this project!