ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.35k stars 5.65k forks source link

[rllib] Recurrent Torch models do not work with A2C/A3C #8560

Closed Arthur-Null closed 3 years ago

Arthur-Null commented 4 years ago

What is the problem?

ray-0.9.0.dev0 python 3.7 pytorch 1.5

Traceback (most recent call last): File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 467, in _process_trial result = self.trial_executor.fetch_result(trial) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 430, in fetch_result result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/worker.py", line 1522, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(AttributeError): ray::A2C.train() (pid=238206, ip=10.190.174.127) File "python/ray/_raylet.pyx", line 460, in ray._raylet.execute_task File "python/ray/_raylet.pyx", line 414, in ray._raylet.execute_task.function_executor File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 500, in train raise e File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 486, in train result = Trainable.train(self) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/tune/trainable.py", line 260, in train result = self._train() File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 132, in _train return self._train_exec_impl() File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 170, in _train_exec_i res = next(self.train_exec_impl) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/util/iter.py", line 682, in next return next(self.built_iterator) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/util/iter.py", line 695, in apply_foreach for item in it: File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/util/iter.py", line 765, in apply_filter for item in it: File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/util/iter.py", line 765, in apply_filter for item in it: File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/util/iter.py", line 695, in apply_foreach for item in it: File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/util/iter.py", line 798, in apply_flatten for item in it: File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/util/iter.py", line 750, in add_wait_hooks item = next(it) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/util/iter.py", line 695, in apply_foreach for item in it: File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/util/iter.py", line 695, in apply_foreach for item in it: File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/execution/rollout_ops.py", line 70, in sampler yield workers.local_worker().sample() File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 521, in sample batches = [self.input_reader.next()] File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 56, in next batches = [self.get_data()] File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 101, in get_data item = next(self.rollout_provider) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 359, in _env_runner callbacks, soft_horizon, no_done_at_end, observation_fn) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 521, in _process_observati outputs.append(episode.batch_builder.build_and_reset(episode)) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/sample_batch_builder.py", line 205, in build self.postprocess_batch_so_far(episode) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/evaluation/sample_batch_builder.py", line 153, in postp pre_batch, other_batches, episode) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/policy/torch_policy_template.py", line 157, in postproc other_agent_batches, episode) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/a3c/a3c_torch_policy.py", line 46, in add_advant last_r = policy._value(sample_batch[SampleBatch.NEXT_OBS][-1]) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/agents/a3c/a3c_torch_policy.py", line 78, in value = self.model({"obs": torch.Tensor([obs]).to(self.device)}, [], [1]) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/models/modelv2.py", line 177, in call res = self.forward(restored, state or [], seq_lens) File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/models/torch/recurrent_net.py", line 69, in forward input_dict["obs_flat"].float(), seq_lens, framework="torch"), File "/data/yucfan/anaconda3/lib/python3.7/site-packages/ray/rllib/policy/rnn_sequencing.py", line 141, in add_time_dimens max_seq_len = padded_batch_size // seq_lens.shape[0] AttributeError: 'list' object has no attribute 'shape'

Reproduction (REQUIRED)


   import ray
   from ray import tune
   from ray.tune.registry import register_env
   from ray.rllib.examples.env.repeat_after_me_env import RepeatAfterMeEnv
   from ray.rllib.examples.env.repeat_initial_obs_env import RepeatInitialObsEnv
   from ray.rllib.examples.models.rnn_model import RNNModel, TorchRNNModel
   from ray.rllib.models import ModelCatalog
   from ray.rllib.utils.test_utils import check_learning_achieved

   parser = argparse.ArgumentParser()
   parser.add_argument("--run", type=str, default="A3C")
   parser.add_argument("--env", type=str, default="RepeatAfterMeEnv")
   parser.add_argument("--num-cpus", type=int, default=0)
   parser.add_argument("--as-test", action="store_true")
   parser.add_argument("--torch", action="store_true")
   parser.add_argument("--stop-reward", type=float, default=90)
   parser.add_argument("--stop-iters", type=int, default=100)
   parser.add_argument("--stop-timesteps", type=int, default=100000)

   if __name__ == "__main__":
       args = parser.parse_args()

       ray.init(num_cpus=args.num_cpus or None)

       ModelCatalog.register_custom_model(
           "rnn", TorchRNNModel if args.torch else RNNModel)
       register_env("RepeatAfterMeEnv", lambda c: RepeatAfterMeEnv(c))
       register_env("RepeatInitialObsEnv", lambda _: RepeatInitialObsEnv())

       config = {
           "env": args.env,
           "env_config": {
               "repeat_delay": 2,
           },
           "gamma": 0.9,
           "num_workers": 0,
           "num_envs_per_worker": 20,
           "entropy_coeff": 0.001,
           "vf_loss_coeff": 1e-5,
           "model": {
               "custom_model": "rnn",
               "max_seq_len": 20,
           },
           "use_pytorch": args.torch,
       }

       stop = {
           "training_iteration": args.stop_iters,
           "timesteps_total": args.stop_timesteps,
           "episode_reward_mean": args.stop_reward,
       }

       results = tune.run(args.run, config=config, stop=stop)

       if args.as_test:
           check_learning_achieved(results, args.stop_reward)
       ray.shutdown()

This is the example code provided in ray/rllib/example/custom_rnn_model.py run python custom_rnn_model.py --run=A3C --torch If we cannot run your script, we cannot fix your issue.

ericl commented 4 years ago

cc @sven1977

pmacalpine commented 4 years ago

I'm also seeing this bug which might be related to issue #7693 I opened but later closed as things had changed so much in rllib that my steps to recreate the bug were no longer valid.

I'm also curious why an empty list is being passed as the state to a potentially recurrent model -- I don't think a recurrent model should ever receive an empty state.

stale[bot] commented 3 years ago

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

You can always ask for help on our discussion forum or Ray's public slack channel.

stale[bot] commented 3 years ago

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!