[rllib] torch based agent fails at instantiation

JD-ETH commented 3 years ago

What is the problem?

Instantiate pytorch agend fails.

Python: 3.7 Ray: 1.3 Pytorch: 1.7.1 CUDA: 11.1 OS: Linux 18.04

Reproduction (REQUIRED)

from ray.rllib.agents.ppo import PPOTrainer
ray.init()
config={
        "framework": "torch",
        "num_gpus": 1,
        "num_workers": 1,
        }
agent = PPOTrainer(config=config, env="CartPole-v1")

[ ] I have verified my script runs in a clean environment and reproduces the issue.
[X] I have verified the issue also occurs with the latest wheels.

Error message and trace:

(pid=15870)   File "python/ray/_raylet.pyx", line 505, in ray._raylet.execute_task
(pid=15870)   File "python/ray/_raylet.pyx", line 449, in ray._raylet.execute_task.function_executor
(pid=15870)   File "/home/jd/anaconda3/envs/rl/lib/python3.7/site-packages/ray/_private/function_manager.py", line 556, in actor_method_executor
(pid=15870)     return method(__ray_actor, *args, **kwargs)
(pid=15870)   File "/home/jd/anaconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 121, in __init__
(pid=15870)     Trainer.__init__(self, config, env, logger_creator)
(pid=15870)   File "/home/jd/anaconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 516, in __init__
(pid=15870)     super().__init__(config, logger_creator)
(pid=15870)   File "/home/jd/anaconda3/envs/rl/lib/python3.7/site-packages/ray/tune/trainable.py", line 98, in __init__
(pid=15870)     self.setup(copy.deepcopy(self.config))
(pid=15870)   File "/home/jd/anaconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 707, in setup
(pid=15870)     self._init(self.config, self.env_creator)
(pid=15870)   File "/home/jd/anaconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 153, in _init
(pid=15870)     num_workers=self.config["num_workers"])
(pid=15870)   File "/home/jd/anaconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 789, in _make_workers
(pid=15870)     logdir=self.logdir)
(pid=15870)   File "/home/jd/anaconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 98, in __init__
(pid=15870)     spaces=spaces,
(pid=15870)   File "/home/jd/anaconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 357, in _make_worker
(pid=15870)     spaces=spaces,
(pid=15870)   File "/home/jd/anaconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 517, in __init__
(pid=15870)     policy_dict, policy_config)
(pid=15870)   File "/home/jd/anaconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1158, in _build_policy_map
(pid=15870)     policy_map[name] = cls(obs_space, act_space, merged_conf)
(pid=15870)   File "/home/jd/anaconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/policy/policy_template.py", line 268, in __init__
(pid=15870)     stats_fn=stats_fn,
(pid=15870)   File "/home/jd/anaconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/policy/policy.py", line 631, in _initialize_loss_from_dummy_batch
(pid=15870)     postprocessed_batch = self.postprocess_trajectory(self._dummy_batch)
(pid=15870)   File "/home/jd/anaconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/policy/policy_template.py", line 291, in postprocess_trajectory
(pid=15870)     other_agent_batches, episode)
(pid=15870)   File "/home/jd/anaconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/evaluation/postprocessing.py", line 135, in compute_gae_for_sample_batch
(pid=15870)     use_critic=policy.config.get("use_critic", True))
(pid=15870)   File "/home/jd/anaconda3/envs/rl/lib/python3.7/site-packages/ray/rllib/evaluation/postprocessing.py", line 53, in compute_advantages
(pid=15870)     np.array([last_r])])
(pid=15870)   File "/home/jd/anaconda3/envs/rl/lib/python3.7/site-packages/torch/tensor.py", line 630, in __array__
(pid=15870)     return self.numpy()
(pid=15870) TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Current Mitigation:

To enable training for now i added a dirty hack in postprocessing.py

from ray.rllib.utils.framework import try_import_torch
torch, nn = try_import_torch()

And before the line 50 added

    if torch.is_tensor(last_r):
        last_r = last_r.cpu()

Simply converting the last_r to cpu doesn't work as it seems to deal with a mixture of tensor and floating type.

timurlenk07 commented 3 years ago

I think it is the same bug as #14523, it is solved in master but they forgot to include in the new release for some reason.

sven1977 commented 3 years ago

Closing this issue. Already fixed as per https://github.com/ray-project/ray/pull/15014

ray-project / ray