Open jpsember opened 7 years ago
This will also let us process the (N * k) training samples in one go, instead of iterating through them as we were doing.
The training samples appear as a matrix on the left side of the function, and the scores similarly appear as outputs on the right. The matrices in between represent the edges between the layers, and the values of the layers (i.e., the 'nodes' within the layers) are implicit in the matrix multiplications that occur as the calculation proceeds from left to right. (Some operations, like ReLU, cannot be represented as a matrix, since they aren't linear operations; but they can be thought of as 'pseudo matrices' here).
So, in our spiral neural network example:
X, The 1xD matrix holding (a single) training sample, represents the input layer.
The W1 matrix and the ReLU operation following it represent the edges connecting the input layer and the hidden layer.
The value of X.W1.ReLU is the (implicit) representation of the nodes of the hidden layer.
The W2 matrix represents the edges connecting the hidden layer to the output layer.
The S matrix, which contains the calculated scores for each label, represents the nodes of the output layer.
Can we experiment by adding a second hidden layer to improve the performance of the network? Reduce the value of H, since it probably doesn't need to be 100 in this case (especially since by dimension analysis we need the second hidden layer to have size H x H).
Opening another issue for this... worth checking out.
I think it's calculating correctly:
py> rake
Samples:
[[ 2. 6. 2. 1.]
[ 0. 7. 6. 6.]
[ 0. 6. 6. 1.]
[ 5. 2. 6. 7.]
[ 1. 6. 3. 7.]
[ 5. 4. 6. 3.]]
Types:
[[3]
[2]
[3]
[3]
[3]
[1]]
SVM loss:
[[ 10.]
[ 3.]
[ 12.]
[ 0.]
[ 0.]
[ 5.]]
py>
Up to now, we've been using our 'func' graph to represent the neural network. This is good for educational purposes but we should now figure out how backpropagation (i.e., calculating gradients) can be done more efficiently by using numpy (matrix representations and operations).