Tidy up line world example

The line world example works.

Though it is a bit worrying that convergence doesn't always happen. Sometimes it diverges. I am not sure why, especially when the model is so simple.

However, let's time box it for now. Going forwards...

Tidy up the example, so that the code isn't all in the main block.

When that is done, note that it takes 32+ seconds to run 1,000 training iterations. But our value function tests show that 1,000 training iterations on the nnet can take under 3 seconds. This suggests that we are wasting 30 seconds on the reinforcement learning target generation. So try vectorizing the calculations and see if we can do any better.

jsphon / reinforcement_learning

Tidy up line world example #47