[rllib] IMPALA can't converge on cluster with Ray 0.6.4

jincongho commented 5 years ago

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
Ray installed from (source or binary): Binary
Ray version: 0.6.4
Python version: 3.6.8

Describe the problem

I have a cluster consist of a p2 head and a c4 worker on AWS. IMPALA can't converge when I train on the cluster with Ray==0.6.4. I originally run on a smaller computer with Ray==0.6.3, so I tried the old Ray version on the cluster, IMPALA converge again. This was tested on BreakoutNoFrameskip-v4, PongNoFrameskip-v4 and AtlantisNoFrameskip-v4.

Source code / logs


import ray
from ray import tune
from ray.tune.registry import register_env
from ray.rllib.env.atari_wrappers import wrap_deepmind
from ray.rllib.agents.impala import ImpalaAgent
from ray.rllib.agents.ppo import PPOAgent
from ray.rllib.agents.dqn import DQNAgent

ray.init()

''' Breakout  Experiment '''

trials = tune.run_experiments({
    "breakout": {
        "run": "IMPALA",
        "env": "BreakoutNoFrameskip-v4",
        "checkpoint_freq": 5, # model checkpoint
        "stop": {
            "timesteps_total": 10000000
        },
        "config": {
            "num_gpus": 1,
            "num_workers": 32,
            "num_envs_per_worker": 10,
            "clip_rewards": True
        }
    },
}, resume=False)```

jincongho commented 5 years ago

I've just tested running IMPALA on a single computer with Ray 0.6.4 on BreakoutNoFrameskip-v4, it doesn't converge as well. Therefore, it seems this problem not with cluster communication between multiple server on Ray 0.6.4.

No sure if this affect other algorithms.

p.s.: run for 500000ts

ericl commented 5 years ago

It seems to be due to https://github.com/ray-project/ray/pull/3967 (cc @bjg2 )

I guess passing vtrace unit tests and the small-scale tests isn't enough to ensure correctness. I'll take a look at this to see if an easy fix can be identified.

ericl commented 5 years ago

Hm, I can confirm the bug is isolated to the changes in vtrace.py. Reverting that file and patching up multi_from_logits to call from_logits fixes the regression. This quite odd since vtrace.py still passes its original unit tests, and I don't see anything particularly suspicious.

I don't have time to dig into this deeper at the moment, so the timely thing to do is probably revert the change for now, and we can reapply it once the real fix is found.

bjg2 commented 5 years ago

Hey guys, sorry about the issue and you did the good thing by reverting the commit. We did train PongDeterministic-v4 with our changes, not sure if we changed the convergence rate on that one. @stefanpantic and I will take a look at why IMPALA behaves differently with our changes on BreakoutNoFrameskip-v4.

bjg2 commented 5 years ago

Attached fix solves the issue. A function arguments were permuted. It learns much faster now, at the same speed as it was before MultiDiscrete changes.

Interesting thing is that training was possible, just slower and less deterministic. We trained pong few times in rllib, and we actually carried the issue from our last repo, where it was in the code for half a year. :'( And we still managed to train something with the bug. Fascinating.

ericl commented 5 years ago

I was able to get to 500 reward within that time on current master. Will test this branch soon.

On Tue, Mar 12, 2019, 3:25 PM Jin Cong Ho (Martin) notifications@github.com wrote:

Not sure if this is helpful, this is BreakoutNoFrameskip-v4 using IMPALA vtrace for 50M timesteps (200M frames) on Ray 0.6.3. It seems the performance stuck around 80.

[image: breakout] https://camo.githubusercontent.com/1136b9fd4ec17abe20b8a62e3ba06dcae40b03ac/68747470733a2f2f692e696d6775722e636f6d2f434255567652762e706e67

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/4329#issuecomment-472204829, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA6SjjRjUJGV0P6nPovZfhK3vVPR4RGks5vWClBgaJpZM4bpW-s .

ericl commented 5 years ago

Merged. Thanks for the quick fix!

jincongho commented 5 years ago

This is my graph for Breakout on Ray 0.7.0.dev1. What could be the problem of not converging? Which yaml or script did you run?

ericl commented 5 years ago

What are the hyperparameters you're using? I was using the atari-impala yaml in the tuned examples folder.

On Fri, Mar 15, 2019, 10:35 AM Jin Cong Ho (Martin) < notifications@github.com> wrote:

This is my graph on Ray 0.7.0.dev1. What could be the problem of not converging?

[image: image] https://camo.githubusercontent.com/e996d59e92e8a151b2195dbdc4865ab3f038f57e/68747470733a2f2f692e696d6775722e636f6d2f616c455a414d322e706e67

— You are receiving this because you modified the open/close state.

Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/4329#issuecomment-473377689, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA6Sk6s1K5MQJYOjxyDM6WNI7KmaRt1ks5vW9nNgaJpZM4bpW-s .

jincongho commented 5 years ago

Sorry, false alert. It was my problem, I created a class wrapper around gym.make object, but then atari_wrapper cannot properly preprocess the frames.

ray-project / ray