Closed jincongho closed 5 years ago
I've just tested running IMPALA on a single computer with Ray 0.6.4 on BreakoutNoFrameskip-v4, it doesn't converge as well. Therefore, it seems this problem not with cluster communication between multiple server on Ray 0.6.4.
No sure if this affect other algorithms.
p.s.: run for 500000ts
It seems to be due to https://github.com/ray-project/ray/pull/3967 (cc @bjg2 )
I guess passing vtrace unit tests and the small-scale tests isn't enough to ensure correctness. I'll take a look at this to see if an easy fix can be identified.
Hm, I can confirm the bug is isolated to the changes in vtrace.py. Reverting that file and patching up multi_from_logits to call from_logits fixes the regression. This quite odd since vtrace.py still passes its original unit tests, and I don't see anything particularly suspicious.
I don't have time to dig into this deeper at the moment, so the timely thing to do is probably revert the change for now, and we can reapply it once the real fix is found.
Hey guys, sorry about the issue and you did the good thing by reverting the commit. We did train PongDeterministic-v4 with our changes, not sure if we changed the convergence rate on that one. @stefanpantic and I will take a look at why IMPALA behaves differently with our changes on BreakoutNoFrameskip-v4.
Attached fix solves the issue. A function arguments were permuted. It learns much faster now, at the same speed as it was before MultiDiscrete changes.
Interesting thing is that training was possible, just slower and less deterministic. We trained pong few times in rllib, and we actually carried the issue from our last repo, where it was in the code for half a year. :'( And we still managed to train something with the bug. Fascinating.
I was able to get to 500 reward within that time on current master. Will test this branch soon.
On Tue, Mar 12, 2019, 3:25 PM Jin Cong Ho (Martin) notifications@github.com wrote:
Not sure if this is helpful, this is BreakoutNoFrameskip-v4 using IMPALA vtrace for 50M timesteps (200M frames) on Ray 0.6.3. It seems the performance stuck around 80.
[image: breakout] https://camo.githubusercontent.com/1136b9fd4ec17abe20b8a62e3ba06dcae40b03ac/68747470733a2f2f692e696d6775722e636f6d2f434255567652762e706e67
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/4329#issuecomment-472204829, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA6SjjRjUJGV0P6nPovZfhK3vVPR4RGks5vWClBgaJpZM4bpW-s .
Merged. Thanks for the quick fix!
This is my graph for Breakout on Ray 0.7.0.dev1. What could be the problem of not converging? Which yaml or script did you run?
What are the hyperparameters you're using? I was using the atari-impala yaml in the tuned examples folder.
On Fri, Mar 15, 2019, 10:35 AM Jin Cong Ho (Martin) < notifications@github.com> wrote:
This is my graph on Ray 0.7.0.dev1. What could be the problem of not converging?
— You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/4329#issuecomment-473377689, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA6Sk6s1K5MQJYOjxyDM6WNI7KmaRt1ks5vW9nNgaJpZM4bpW-s .
Sorry, false alert. It was my problem, I created a class wrapper around gym.make object, but then atari_wrapper
cannot properly preprocess the frames.
System information
Describe the problem
I have a cluster consist of a p2 head and a c4 worker on AWS. IMPALA can't converge when I train on the cluster with Ray==0.6.4. I originally run on a smaller computer with Ray==0.6.3, so I tried the old Ray version on the cluster, IMPALA converge again. This was tested on BreakoutNoFrameskip-v4, PongNoFrameskip-v4 and AtlantisNoFrameskip-v4.
Source code / logs