Open dhfromkorea opened 7 years ago
@dhfromkorea I ran the agent several times and it usually would get close to that score. If it's not getting above 100 then there's something quite seriously wrong. May I ask precisely what command you are using to run the agent and what commit you are on?
@steveKapturowski I am running on HEAD of master branch w/ python2.7 tensorflow(cpu) 1.2.1
The command is: python2 main.py MontezumaRevenge-v0 --load_config config/dqn-cts.yaml -n 32
Can you try adding the following options: --q_target_update_steps=30000 --max_global_steps=160000000 --epsilon_annealing_steps=500000 --replay_size=500000 --clip_norm_type=ignore
The first 4 I'm suggesting mainly for consistency with my experiments; I suspect the norm clipping may be what's really killing performance
Hello Steve. I'm SangJin. I'm with dhfromkorea. It still doesn't look reproducible.
cmd: python2 main.py MontezumaRevenge-v0 --load_config config/dqn-cts.yaml -n 12 \ --q_target_update_steps=30000 \ --max_global_steps=160000000 \ --epsilon_annealing_steps=500000 \ --replay_size=500000 \ --clip_norm_type=ignore --restore_checkpoint
git: master / bcc9b2a tensorflow-gpu==1.2.1
[2017-07-29 12:54:36,278] T2 / STEP 70243145 / REWARD 0.0 / Q_MAX 1.5947 / EPS 0.1000 [2017-07-29 12:54:36] INFO [MainThread:284] ID: 2 -- RUNNING AVG: 9 ± 90 -- BEST: 400 [2017-07-29 12:54:36,278] ID: 2 -- RUNNING AVG: 9 ± 90 -- BEST: 400 [2017-07-29 12:54:44] INFO [MainThread:279] T4 / STEP 70246725 / REWARD 0.0 / Q_MAX 1.8437 / EPS 0.0100 [2017-07-29 12:54:44,166] T4 / STEP 70246725 / REWARD 0.0 / Q_MAX 1.8437 / EPS 0.0100 [2017-07-29 12:54:44] INFO [MainThread:284] ID: 4 -- RUNNING AVG: 14 ± 98 -- BEST: 400 [2017-07-29 12:54:44,167] ID: 4 -- RUNNING AVG: 14 ± 98 -- BEST: 400 [2017-07-29 12:55:03] INFO [MainThread:279] T3 / STEP 70256320 / REWARD 0.0 / Q_MAX 1.5227 / EPS 0.2000 [2017-07-29 12:55:03,665] T3 / STEP 70256320 / REWARD 0.0 / Q_MAX 1.5227 / EPS 0.2000 [2017-07-29 12:55:03] INFO [MainThread:284] ID: 3 -- RUNNING AVG: 28 ± 179 -- BEST: 400
If you are interested, we could give you access to the server running the agent, maybe we could find out what's wrong together.
Best Regards,
Hi @sangjin-park, I'd be happy to try to debug what's going on in the server but first could you try running on the commit 39e695696488df83bf6d08a1eb7df0ff4ebd109c and tell me if there's any difference?
Hi I tried 452d57 and it looks ok.
Thanks!
I'm going to check the diff between commit 452d57 and master to see what went wrong and get a fix out asap
@sangjin-park I was checking out commit 452d5735551c672e2ce44740514b105cb045075e and noticed something funny: the ordering of the context window is backwards which I would expect to hurt performance https://github.com/steveKapturowski/tensorflow-rl/blob/452d5735551c672e2ce44740514b105cb045075e/utils/fast_cts.pyx#L305-L308 as compared to the ordering in commit 39e695696488df83bf6d08a1eb7df0ff4ebd109c: https://github.com/steveKapturowski/tensorflow-rl/blob/39e695696488df83bf6d08a1eb7df0ff4ebd109c/utils/fast_cts.pyx#L305-L308
Did you produce your OpenAI gym evaluation from the former commit?
My branch's window order is the former one.
context[0] = obs[i, j-1] if j > 0 else 0 context[1] = obs[i-1, j] if i > 0 else 0 context[2] = obs[i-1, j-1] if i > 0 and j > 0 else 0 context[3] = obs[i-1, j+1] if i > 0 and j < self.width-1 else 0
Hi Steve, I am trying to reproduce the ~3600 score you achieved on Montezuma's Revenge with your dqn-cts model (as per the gif image on README).
With 30M steps counting, the model does not seem to learn. It very occasionally gets the key (+100 points) and that's all. I ran your code as it is and did not modify a single line.
1) Could I ask if you can reproduce 3600 "on average" with your dqn-cts?
2) Also would you say I should try some other hyperparameter settings other than the ones you set as default?
I look forward to your advice.
Best wishes,