[rllib] Need More Tuned Examples for Continuous Action Tasks with Multi-GPU Settings.

haje01 commented 5 years ago

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04.6 LTS
Ray installed from (source or binary): pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.8.0.dev3-cp36-cp36m-manylinux1_x86_64.whl
Ray version: 0.8.0
Python version: 3.6
Exact command to reproduce:

Hi. I am trying to find a distributed(and hopefully faster) training setting for continuous action tasks, like Humanoid or Halfcheetah. Since I couldn't find one in tuned examples, I tried it myself. My first idea was changing existing tuned APPO example(halfcheetah-appo.yaml) into Multi-GPU setting:

humanoid-appo:
    env: RoboschoolHumanoid-v1  # <-- Changed from HalfCheetah-v2
    run: APPO
    stop:
        time_total_s: 10800   
    config:
        vtrace: True
        gamma: 0.99
        lambda: 0.95
        sample_batch_size: 512
        train_batch_size: 4096
        num_workers: 64 APPO  # <-- Changed from 16
        num_gpus: 4  APPO  # <-- Changed from 1
        broadcast_interval: 1
        max_sample_requests_in_flight_per_worker: 1
        num_data_loader_buffers: 16  # <-- Changed from 1
        num_envs_per_worker: 32
        minibatch_buffer_size: 16
        num_sgd_iter: 32
        clip_param: 0.2
        lr_schedule: [
            [0, 0.0005],
            [150000000, 0.000001],
        ]
        vf_loss_coeff: 0.5
        entropy_coeff: 0.01
        grad_clip: 0.5
        batch_mode: truncate_episodes
        use_kl_loss: True
        kl_coeff: 1.0
        kl_target: 0.04       
        observation_filter: MeanStdFilter

But the training raised following error:

2019-09-23 10:08:43,772 ERROR trial_runner.py:552 -- Error processing event.
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 498, i
n _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 
347, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/worker.py", line 2332, in get
    raise value
ray.exceptions.RayTaskError: ray_APPO:train() (pid=61284, host=ip-172-31-31-205)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/trainer_template.py",
 line 90, in __init__
    Trainer.__init__(self, config, env, logger_creator)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 363, in __init__
    Trainable.__init__(self, config, logger_creator)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/tune/trainable.py", line 99, in __init__
    self._setup(copy.deepcopy(self.config))
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 483, in _setup
    self._init(self.config, self.env_creator)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/trainer_template.py", line 111, in _init
    self.optimizer = make_policy_optimizer(self.workers, config)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/impala/impala.py", line 134, in make_aggregators_and_optimizer
    **config["optimizer"])
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/optimizers/async_samples_optimizer.py", line 74, in __init__
    _fake_gpus=_fake_gpus)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/optimizers/aso_multi_gpu_learner.py", line 109, in __init__
    self.policy.copy))
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/optimizers/multi_gpu_impl.py", line 70, in __init__
    self._shared_loss = build_graph(self.loss_inputs)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/policy/dynamic_tf_policy.py", line 233, in copy
    loss = instance._do_loss_init(input_dict)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/policy/dynamic_tf_policy.py", line 349, in _do_loss_init
    loss = self._loss_fn(self, self.model, self._dist_class, train_batch)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/ppo/appo_policy.py", line 252, in build_appo_surrogate_loss
    target_model_out, _ = policy.target_model.from_batch(train_batch)
AttributeError: 'AsyncPPOTFPolicy' object has no attribute 'target_model'

What's the problem? By the way, I wish we could have more tuned examples on distributed continuous action tasks.

Thank you.

ericl commented 5 years ago

cc @michaelzhiluo can you open source your examples?

ArturNiederfahrenhorst commented 4 years ago

Experiencing a similar problem with an updated AsyncPPOTFPolicy. My policy behaves as expected with 'num_gpus: 1' but throws the same error at me with 'num_gpus: 2'.

adriendoerig commented 4 years ago

I have the same problem with AsyncPPOTFPolicy. My policy behaves as expected with 'num_gpus: 1' but throws the same error at me with 'num_gpus: 2'.

Is there any known way to fix this?

ArturNiederfahrenhorst commented 4 years ago

I have the same problem with AsyncPPOTFPolicy. My policy behaves as expected with 'num_gpus: 1' but throws the same error at me with 'num_gpus: 2'.

Is there any known way to fix this?

I have not worked on that issue, sorry. Does the execution plan automatically create a multi GPU Learner thread if you hand it more than one GPU resource?

stale[bot] commented 3 years ago

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

ray-project / ray

[rllib] Need More Tuned Examples for Continuous Action Tasks with Multi-GPU Settings. #5757

System information