Refactor the learning module

We can learn using experience or a model.

With experience, we need a list of states, actions and rewards. If we have a model of behaviour too, then we can do Sarsa and vectorize it.

With a list of states and a model... We can use the model to generate the actions and rewards for a given state. From this we can calculate the targets.

jsphon / reinforcement_learning

Refactor the learning module #32