ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.01k stars 5.78k forks source link

[rllib] Need More Tuned Examples for Continuous Action Tasks with Multi-GPU Settings. #5757

Closed haje01 closed 1 year ago

haje01 commented 5 years ago

System information

Hi. I am trying to find a distributed(and hopefully faster) training setting for continuous action tasks, like Humanoid or Halfcheetah. Since I couldn't find one in tuned examples, I tried it myself. My first idea was changing existing tuned APPO example(halfcheetah-appo.yaml) into Multi-GPU setting:

humanoid-appo:
    env: RoboschoolHumanoid-v1  # <-- Changed from HalfCheetah-v2
    run: APPO
    stop:
        time_total_s: 10800   
    config:
        vtrace: True
        gamma: 0.99
        lambda: 0.95
        sample_batch_size: 512
        train_batch_size: 4096
        num_workers: 64 APPO  # <-- Changed from 16
        num_gpus: 4  APPO  # <-- Changed from 1
        broadcast_interval: 1
        max_sample_requests_in_flight_per_worker: 1
        num_data_loader_buffers: 16  # <-- Changed from 1
        num_envs_per_worker: 32
        minibatch_buffer_size: 16
        num_sgd_iter: 32
        clip_param: 0.2
        lr_schedule: [
            [0, 0.0005],
            [150000000, 0.000001],
        ]
        vf_loss_coeff: 0.5
        entropy_coeff: 0.01
        grad_clip: 0.5
        batch_mode: truncate_episodes
        use_kl_loss: True
        kl_coeff: 1.0
        kl_target: 0.04       
        observation_filter: MeanStdFilter

But the training raised following error:

2019-09-23 10:08:43,772 ERROR trial_runner.py:552 -- Error processing event.
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 498, i
n _process_trial
    result = self.trial_executor.fetch_result(trial)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 
347, in fetch_result
    result = ray.get(trial_future[0])
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/worker.py", line 2332, in get
    raise value
ray.exceptions.RayTaskError: ray_APPO:train() (pid=61284, host=ip-172-31-31-205)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/trainer_template.py",
 line 90, in __init__
    Trainer.__init__(self, config, env, logger_creator)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 363, in __init__
    Trainable.__init__(self, config, logger_creator)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/tune/trainable.py", line 99, in __init__
    self._setup(copy.deepcopy(self.config))
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 483, in _setup
    self._init(self.config, self.env_creator)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/trainer_template.py", line 111, in _init
    self.optimizer = make_policy_optimizer(self.workers, config)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/impala/impala.py", line 134, in make_aggregators_and_optimizer
    **config["optimizer"])
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/optimizers/async_samples_optimizer.py", line 74, in __init__
    _fake_gpus=_fake_gpus)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/optimizers/aso_multi_gpu_learner.py", line 109, in __init__
    self.policy.copy))
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/optimizers/multi_gpu_impl.py", line 70, in __init__
    self._shared_loss = build_graph(self.loss_inputs)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/policy/dynamic_tf_policy.py", line 233, in copy
    loss = instance._do_loss_init(input_dict)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/policy/dynamic_tf_policy.py", line 349, in _do_loss_init
    loss = self._loss_fn(self, self.model, self._dist_class, train_batch)
  File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/ppo/appo_policy.py", line 252, in build_appo_surrogate_loss
    target_model_out, _ = policy.target_model.from_batch(train_batch)
AttributeError: 'AsyncPPOTFPolicy' object has no attribute 'target_model'

What's the problem? By the way, I wish we could have more tuned examples on distributed continuous action tasks.

Thank you.

ericl commented 5 years ago

cc @michaelzhiluo can you open source your examples?

ArturNiederfahrenhorst commented 4 years ago

Experiencing a similar problem with an updated AsyncPPOTFPolicy. My policy behaves as expected with 'num_gpus: 1' but throws the same error at me with 'num_gpus: 2'.

adriendoerig commented 4 years ago

I have the same problem with AsyncPPOTFPolicy. My policy behaves as expected with 'num_gpus: 1' but throws the same error at me with 'num_gpus: 2'.

Is there any known way to fix this?

ArturNiederfahrenhorst commented 4 years ago

I have the same problem with AsyncPPOTFPolicy. My policy behaves as expected with 'num_gpus: 1' but throws the same error at me with 'num_gpus: 2'.

Is there any known way to fix this?

I have not worked on that issue, sorry. Does the execution plan automatically create a multi GPU Learner thread if you hand it more than one GPU resource?

stale[bot] commented 3 years ago

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

You can always ask for help on our discussion forum or Ray's public slack channel.