mws262 / qwop-controls

0 stars 0 forks source link

Policy gradient #36

Closed mws262 closed 5 years ago

mws262 commented 5 years ago

Implements REINFORCE policy gradient approach for QWOP and for baseline cartpole from OpenAI gym. Still needs a little restructuring, but I want to get these resources in immediately as I work on other method implementations. I don't have a clear picture of the final structure I'll need for all this stuff yet.

As for results, the approach eventually does ok on cartpole, but I've not seen it do anything reasonable on QWOP after several hours of training.