Policy gradient - Githubissues

Implements REINFORCE policy gradient approach for QWOP and for baseline cartpole from OpenAI gym. Still needs a little restructuring, but I want to get these resources in immediately as I work on other method implementations. I don't have a clear picture of the final structure I'll need for all this stuff yet.

As for results, the approach eventually does ok on cartpole, but I've not seen it do anything reasonable on QWOP after several hours of training.

mws262 / qwop-controls

Policy gradient #36