Closed schrum2 closed 5 years ago
You've definitely done some work on this. List the algorithms you've tried, indicating which ones work and which ones don't
I have tried every single algorithm in this tidy repository, and only the following do not work:
Traceback (most recent call last):
File "PPO_pendulum.py", line 139, in <module>
lr_actor=0.0001, lr_value=0.0002, gamma=0.9, clip_range=0.2)
File "PPO_pendulum.py", line 87, in __init__
self.neglogp = actor.get_neglogp(self.OBS, self.ACT, reuse=True)
AttributeError: 'ActorNetwork' object has no attribute 'get_neglogp'
Upon correcting that line to actor.get_logp
, corresponding to a method that was in the ActorNetwork
class, I got it to render. However, it was not making any progress on learning at all; in fact, after about a hundred episodes, its sole steps became simply to swing the pendulum violently in one direction. I did not mark it as a working file, for I am unsure of whether that is an error with the code or with the algorithm.
The files that do work are in the folder titled "Agents." They are A2C_cartpole.py, DDPG_pendulum.py. DQN_cartpole.py, HER_coin.py (does not render), PG_cartpole.py, and SAC_pendulum.py.
Re-opening this because it sounds like actor.get_logp is the opposite of actor.get_neglogp, and this might cause the lack of learning. You need to put a negative sign in front of the method call to make this the correct value.
Maybe I misunderstood, but check this issue, and then if it still doesn't work (or I guess even if it starts working) you can close this issue.
I changed line 87 to self.neglogp = -actor.get_logp(self.OBS, self.ACT, reuse=True)
and still received consistently poor results. I know the developer did something similar at line 90: actor_loss = -tf.reduce_mean((tf.minimum(ratio, clip_ratio)) * self.ADV)
, so the issue has got to me much more than that.
The DQN agent worked for CartPole, so why not try other agents from: https://github.com/sarcturus00/Tidy-Reinforcement-learning