I might be reading it incorrectly, but it looks like you don't apply the activation function to the final output layer? (should that be applied, in this context?)
You do not need to do that in your context .If you are predicting the probabilities then maybe you can apply a softmax layer to layer_2 as trask shows in the later chapters
I might be reading it incorrectly, but it looks like you don't apply the activation function to the final output layer? (should that be applied, in this context?)