Try other code from Tidy RL

schrum2 commented 5 years ago

The DQN agent worked for CartPole, so why not try other agents from: https://github.com/sarcturus00/Tidy-Reinforcement-learning

schrum2 commented 5 years ago

You've definitely done some work on this. List the algorithms you've tried, indicating which ones work and which ones don't

nazaruka commented 5 years ago

I have tried every single algorithm in this tidy repository, and only the following do not work:

A2C_mountain_car.py: rendered, but closed immediately afterwards with errors
PPO_pendulum.py: did not render at all and put out the following error:
```
Traceback (most recent call last):
File "PPO_pendulum.py", line 139, in <module>
lr_actor=0.0001, lr_value=0.0002, gamma=0.9, clip_range=0.2)
File "PPO_pendulum.py", line 87, in __init__
self.neglogp = actor.get_neglogp(self.OBS, self.ACT, reuse=True)
AttributeError: 'ActorNetwork' object has no attribute 'get_neglogp'
```
Upon correcting that line to actor.get_logp, corresponding to a method that was in the ActorNetwork class, I got it to render. However, it was not making any progress on learning at all; in fact, after about a hundred episodes, its sole steps became simply to swing the pendulum violently in one direction. I did not mark it as a working file, for I am unsure of whether that is an error with the code or with the algorithm.

The files that do work are in the folder titled "Agents." They are A2C_cartpole.py, DDPG_pendulum.py. DQN_cartpole.py, HER_coin.py (does not render), PG_cartpole.py, and SAC_pendulum.py.

schrum2 commented 5 years ago

Re-opening this because it sounds like actor.get_logp is the opposite of actor.get_neglogp, and this might cause the lack of learning. You need to put a negative sign in front of the method call to make this the correct value.

Maybe I misunderstood, but check this issue, and then if it still doesn't work (or I guess even if it starts working) you can close this issue.

nazaruka commented 5 years ago

I changed line 87 to self.neglogp = -actor.get_logp(self.OBS, self.ACT, reuse=True) and still received consistently poor results. I know the developer did something similar at line 90: actor_loss = -tf.reduce_mean((tf.minimum(ratio, clip_ratio)) * self.ADV), so the issue has got to me much more than that.

nazaruka / gym-http-api

Try other code from Tidy RL #5