Fix/Clean up Classifiers, Fix RegularizerL2, Add 2-Layer Net Connectivity Tests

This PR marks the first time we have a somewhat-functioning, ad-hoc, two-layer, network. This process revealed multiple deficiencies in the code:

The Softmax classifier was incorrect when given multiple points (images) in a batch. The Softmax implementation had two bugs and there were holes in the tests which did not reveal the issues. The first bug was that the gradients were not scaled by the number of input points. The second bug was that the gradients at the indices of the correct classifications were not calculated correctly (it needed an axis=1).
The L2 Regularizer was not properly scaled. The scaling by 1/2 makes the regularizer's gradients 1 so the regularizer has no influence on the farthest gradient.

There are other additions here:

I added set_batch_labels() on top of the classifiers so that they conform to the forward(), backward() API.
Documentation Cleanups: I indicated Side-Effects when they occurred and updated the Outputs due to (1).
Test Cleanups: The Softmax tests now have test fixtures which allow for easier testing and less duplication.
Two Layer Network Tests: This was the goal of the PR until it ballooned when I found the bugs, but the PR does include a backprop test of a two-layer network with a forward and backward API.

This resolves #10.

hershic / cs231n