Closed nikoliazekter closed 7 years ago
We are trackin this here: https://github.com/reinforceio/tensorforce/issues/64
TRPO has a numerical issue that makes it crash occasionally, PPO and VPG work well. I will change quickstart to PPO for now
Changed quickstart to PPO
Learning finished. Total episodes: 3000. Average reward of last 100 episodes: 17.76.
That doesn't look right.