Add backpropagation optimizers

@mratsim I'm curious about adding momentum to SGD (largely to avoid doing any actual work in my own Nim projects, ha). Would you want to do it in the same way as PyTorch/Tensorflow? That is, both libraries provide a single "SGD" optimizer with a momentum parameter, where for momentum > 0 momentum is applied, and where momentum == 0 it acts as simple SGD. They then also implement a "Nesterov" boolean, where true applies Nesterov momentum instead of regular momentum. Or do you envision a different implementation?

See: https://github.com/pytorch/pytorch/blob/5911cb8e5cdc24218f57480b6647d37d86e77620/torch/optim/sgd.py#L51-L52

and: https://github.com/tensorflow/tensorflow/blob/59217f581fdef4e5469a98b62e38f851eac88688/tensorflow/python/keras/optimizers.py#L172

mratsim / Arraymancer

Add backpropagation optimizers #298