Closed schrum2 closed 5 years ago
I made some changes to enjoy.py, and it seems like this version of PPO is actually about as good as the Tensorflow version. I defined the total return as simply the sum of all the rewards (no discounting) and ended up with a total return between 3000 and 9000 for the tests I did. If I add discounting, I'll probably get the appropriate return around 6000.
Importantly, the place where Sonic got stuck in Green Hill Zone Act 1 was the first big loop, which we know the Tensorflow version of PPO also has a problem with ... perhaps if I had trained longer, PyTorch could have gotten through that too.
I think I'll add discounting to the Return before closing this.
Closing this issue based on commit 88d7edddc0249ab20c5f6e9d771049a2c8e3efbe
PyTorch PPO is good!
I let the PyTorch version of PPO train overnight, and although the reported training rewards definitely went up over time, eventually achieving over 5,000 in Green Hill Zone Act 1, the performance when I observe the trained model doesn't seem quite that good.
Note that the trained model can be observed with the following command:
python enjoy.py --load-dir save/ppo --env-name "SonicTheHedgehog-Genesis"
(I just committed the model I trained overnight) Note that you may need to change the "device" in the enjoy code from gpu to cpu.In any case, enjoy evaluates the agent multiple times without any random exploration, and therefore performance is quite consistent. Although Sonic does move to the right, he also often gets stuck or dies.
So what is the point of this issue?
TODO:
Add print statements to enjoy.py that print out the accumulated reward that is comparable to the periodically reported reward values printed during training, and see what they are like. If the enjoy values are much less, as I suspect, then we need to try resuming training of an existing model (load and train further) and see what sort of reward values the resumed model receives, and also watch the behavior of the resumed model while training vs while being evaluated with enjoy