hershic / cs231n

Stanford CS231N
0 stars 0 forks source link

Implement LayerFullyConnected #18

Closed hershal closed 7 years ago

hershal commented 7 years ago

This PR consists of the complete implementation of LayerFullyConected with vectorized gradient calculations for the inputs, weights, and bias. Previously I implemented a very basic version of LayerFullyConnected without gradient calculations for testing, but I realized I needed to make an API change: I transposed the weights NDArray so that it's oriented similarly to how the input data is oriented: the original weights were oriented by (num_outputs, num_inputs), where num_inputs is the size of the input number of pixels, and num_outputs is the number of classifications. I originally used (num_outputs, num_inputs) because the vectorized math works out a bit cleaner, but this is inconsistent with the input structures, most of the examples, and seems to confuse more than simplify.

I transposed the LayerFullyConnected weights so that it takes the shape (num_inputs, num_outputs), so that it is easy to reason about inputs and outputs dimensions. However, this changes the orientation of the scores that the classifiers receive: they used to receive scores in the shape (classification_scores, batch_size), where classification_scores is the num_outputs from LayerFullyConnected, and batch_size is the number of images in a batch. I updated the tests and classifiers to use the new orientation (batch_size, classification_scores), which is consistent with the shape of the imported data set.

I fully realize that I could've saved myself the trouble by just transposing the scores matrix in the classifiers and the forward_vectorized code in LayerFullyConnected, thus keeping the classifier and layer code exactly the same. However, I feel that this adds cruft if I am ever to come back to this design and look at it again.

I also added the gradient calculation in the naive implementation of the SVM classifier so the performance test should be a bit more accurate now and changed the LayerFullyConnected constructor to take a tuple of the desired transformation instead of two integers.

This PR also resolves #2.