Minimal implementation of Stochastic Policy Gradient Algorithm in Keras
This PG agent seems to get more frequent wins after about 8000 episodes. Below is the score graph.