nikhilbarhate99 / TD3-PyTorch-BipedalWalker-v2

Twin Delayed DDPG (TD3) PyTorch solution for Roboschool and Box2d environment
MIT License
101 stars 30 forks source link

Solved Bipedal Walker Environment #3

Closed johannmeyer closed 5 years ago

johannmeyer commented 5 years ago

I noticed in your code you only check if the average of the last 10 episodes are above 300. But on the leaderboard page it requires that the last 100 episode average is above 300. Did you test it with such a large averaging window? The reason I ask is because from the figures in this https://github.com/nikhilbarhate99/TD3-PyTorch-BipedalWalker-v2/issues/2 it appears that the reward signal is not that stable to get such a high average. I've been trying to solve the same environment with DDPG and the agent will master it but make a few mistakes in between making it hard to get the 100-episode average above 300.

Also did you ever experience the agent forgetting what it has learned after training the model longer?

nikhilbarhate99 commented 5 years ago

I think I got confused between number of episodes and number of updates. Still not sure though, considering different algorithms use different number of episodes per update. No, I have not tested it with large averaging window. But I think that's because all value based methods tend to be unstable. TD3 should have reduced this instability but it also depends on the environment and the reward signal, which could be the problem in this env. Also, the variance can be reduced by decaying the noise in later stages (not implemented in this repo) and deleting the older experiences which should solve the forgetting problem, but I would suggest you to test your algorithm on different environment (I did not face this problem on the lunar lander env).