nazaruka / gym-http-api

NSGA2-based Sonic agent + experimental code
MIT License
1 stars 1 forks source link

Try other code from Tidy RL #5

Closed schrum2 closed 5 years ago

schrum2 commented 5 years ago

The DQN agent worked for CartPole, so why not try other agents from: https://github.com/sarcturus00/Tidy-Reinforcement-learning

schrum2 commented 5 years ago

You've definitely done some work on this. List the algorithms you've tried, indicating which ones work and which ones don't

nazaruka commented 5 years ago

I have tried every single algorithm in this tidy repository, and only the following do not work:

The files that do work are in the folder titled "Agents." They are A2C_cartpole.py, DDPG_pendulum.py. DQN_cartpole.py, HER_coin.py (does not render), PG_cartpole.py, and SAC_pendulum.py.

schrum2 commented 5 years ago

Re-opening this because it sounds like actor.get_logp is the opposite of actor.get_neglogp, and this might cause the lack of learning. You need to put a negative sign in front of the method call to make this the correct value.

Maybe I misunderstood, but check this issue, and then if it still doesn't work (or I guess even if it starts working) you can close this issue.

nazaruka commented 5 years ago

I changed line 87 to self.neglogp = -actor.get_logp(self.OBS, self.ACT, reuse=True) and still received consistently poor results. I know the developer did something similar at line 90: actor_loss = -tf.reduce_mean((tf.minimum(ratio, clip_ratio)) * self.ADV), so the issue has got to me much more than that.