This PR consists of the complete implementation of LayerFullyConected with vectorized gradient calculations for the inputs, weights, and bias. Previously I implemented a very basic version of LayerFullyConnected without gradient calculations for testing, but I realized I needed to make an API change: I transposed the weights NDArray so that it's oriented similarly to how the input data is oriented: the original weights were oriented by (num_outputs, num_inputs), where num_inputs is the size of the input number of pixels, and num_outputs is the number of classifications. I originally used (num_outputs, num_inputs) because the vectorized math works out a bit cleaner, but this is inconsistent with the input structures, most of the examples, and seems to confuse more than simplify.
I transposed the LayerFullyConnected weights so that it takes the shape (num_inputs, num_outputs), so that it is easy to reason about inputs and outputs dimensions. However, this changes the orientation of the scores that the classifiers receive: they used to receive scores in the shape (classification_scores, batch_size), where classification_scores is the num_outputs from LayerFullyConnected, and batch_size is the number of images in a batch. I updated the tests and classifiers to use the new orientation (batch_size, classification_scores), which is consistent with the shape of the imported data set.
I fully realize that I could've saved myself the trouble by just transposing the scores matrix in the classifiers and the forward_vectorized code in LayerFullyConnected, thus keeping the classifier and layer code exactly the same. However, I feel that this adds cruft if I am ever to come back to this design and look at it again.
I also added the gradient calculation in the naive implementation of the SVM classifier so the performance test should be a bit more accurate now and changed the LayerFullyConnected constructor to take a tuple of the desired transformation instead of two integers.
This PR consists of the complete implementation of
LayerFullyConected
with vectorized gradient calculations for the inputs, weights, and bias. Previously I implemented a very basic version ofLayerFullyConnected
without gradient calculations for testing, but I realized I needed to make an API change: I transposed the weightsNDArray
so that it's oriented similarly to how the input data is oriented: the original weights were oriented by(num_outputs, num_inputs)
, wherenum_inputs
is the size of the input number of pixels, andnum_outputs
is the number of classifications. I originally used(num_outputs, num_inputs)
because the vectorized math works out a bit cleaner, but this is inconsistent with the input structures, most of the examples, and seems to confuse more than simplify.I transposed the
LayerFullyConnected
weights so that it takes the shape(num_inputs, num_outputs)
, so that it is easy to reason about inputs and outputs dimensions. However, this changes the orientation of the scores that the classifiers receive: they used to receive scores in the shape(classification_scores, batch_size)
, whereclassification_scores
is thenum_outputs
fromLayerFullyConnected
, andbatch_size
is the number of images in a batch. I updated the tests and classifiers to use the new orientation(batch_size, classification_scores)
, which is consistent with the shape of the imported data set.I fully realize that I could've saved myself the trouble by just transposing the scores matrix in the classifiers and the
forward_vectorized
code inLayerFullyConnected
, thus keeping the classifier and layer code exactly the same. However, I feel that this adds cruft if I am ever to come back to this design and look at it again.I also added the gradient calculation in the naive implementation of the SVM classifier so the performance test should be a bit more accurate now and changed the
LayerFullyConnected
constructor to take a tuple of the desired transformation instead of two integers.This PR also resolves #2.