ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.92k stars 5.77k forks source link

[rllib][cql] Documentation incorrectly indicates rnn and lstm support #15277

Closed mvindiola1 closed 3 years ago

mvindiola1 commented 3 years ago

What is the problem?

Ray: Nightly

The CQL documentation lists that it supports "RNN, LSTM auto-wrapping, and autoreg" but its trainer is a customization of SAC which does not support these features.

Reproduction (REQUIRED)

  1. Create an offline data file
    /path/to/ray/rllib/train.py --run=PPO --env=Pendulum-v0 \
    --stop='{"episodes_total": 100}' \
    --config='{"output": "/tmp/test_cql", "batch_mode": "complete_episodes", "model":{"use_lstm":true}}'   
  2. Run CQL with data file
    /path/to/ray/rllib/train.py  --run=CQL --env=Pendulum-v0 \
          --stop='{"episodes_total": 100}' \
          --config='{"input": "/tmp/test_cql", "framework":"torch", "batch_mode": "complete_episodes", "model":{"use_lstm":true}}'

Error:

(pid=3649625) 2021-04-13 14:30:35,046   INFO trainer.py:703 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=3649625) 2021-04-13 14:30:35,053   ERROR worker.py:395 -- Exception raised in creation task: The actor died because of an error raised in its creation task, ray::CQL.__init__() (pid=3649625, ip=192.168.1.216)
(pid=3649625)   File "python/ray/_raylet.pyx", line 505, in ray._raylet.execute_task
(pid=3649625)   File "python/ray/_raylet.pyx", line 449, in ray._raylet.execute_task.function_executor
(pid=3649625)   File "/path/to/ray/_private/function_manager.py", line 566, in actor_method_executor
(pid=3649625)     return method(__ray_actor, *args, **kwargs)
(pid=3649625)   File "/path/to/ray/rllib/agents/trainer_template.py", line 122, in __init__
(pid=3649625)     Trainer.__init__(self, config, env, logger_creator)
(pid=3649625)   File "/path/to/ray/rllib/agents/trainer.py", line 523, in __init__
(pid=3649625)     super().__init__(config, logger_creator)
(pid=3649625)   File "/path/to/ray/tune/trainable.py", line 98, in __init__
(pid=3649625)     self.setup(copy.deepcopy(self.config))
(pid=3649625)   File "/path/to/ray/rllib/agents/trainer.py", line 714, in setup
(pid=3649625)     self._init(self.config, self.env_creator)
(pid=3649625)   File "/path/to/ray/rllib/agents/trainer_template.py", line 154, in _init
(pid=3649625)     num_workers=self.config["num_workers"])
(pid=3649625)   File "/path/to/ray/rllib/agents/trainer.py", line 796, in _make_workers
(pid=3649625)     logdir=self.logdir)
(pid=3649625)   File "/path/to/ray/rllib/evaluation/worker_set.py", line 98, in __init__
(pid=3649625)     spaces=spaces,
(pid=3649625)   File "/path/to/ray/rllib/evaluation/worker_set.py", line 357, in _make_worker
(pid=3649625)     spaces=spaces,
(pid=3649625)   File "/path/to/ray/rllib/evaluation/rollout_worker.py", line 517, in __init__
(pid=3649625)     policy_dict, policy_config)
(pid=3649625)   File "/path/to/ray/rllib/evaluation/rollout_worker.py", line 1158, in _build_policy_map
(pid=3649625)     policy_map[name] = cls(obs_space, act_space, merged_conf)
(pid=3649625)   File "/path/to/ray/rllib/policy/policy_template.py", line 224, in __init__
(pid=3649625)     self, obs_space, action_space, config)
(pid=3649625)   File "/path/to/ray/rllib/agents/sac/sac_torch_policy.py", line 77, in build_sac_model_and_action_dist
(pid=3649625)     model = build_sac_model(policy, obs_space, action_space, config)
(pid=3649625)   File "/path/to/ray/rllib/agents/sac/sac_tf_policy.py", line 84, in build_sac_model
(pid=3649625)     target_entropy=config["target_entropy"])
(pid=3649625)   File "/path/to/ray/rllib/models/catalog.py", line 581, in get_model_v2
(pid=3649625)     name, **model_kwargs)
(pid=3649625) TypeError: __init__() got an unexpected keyword argument 'policy_model_config'

If the code snippet cannot be run by itself, the issue will be closed with "needs-repro-script".

mvindiola1 commented 3 years ago

This is fixed in the current documentation.