avacaondata commented 2 years ago

Search before asking

[X] I searched the issues and found no similar issues.

Ray Component

Ray Tune, RLlib

What happened + What you expected to happen

When trying a RLLib experiment following these guidelines: https://www.tensortrade.org/en/latest/examples/train_and_evaluate_using_ray.html with this config: ` tune.run( run_or_experiment="PPO", # We'll be using the builtin PPO agent in RLLib name="MyExperiment1", metric='episode_reward_mean', mode='max',

resources_per_trial= {"cpu": 8, "gpu": 1},

    stop={
        "training_iteration": 100  # Let's do 5 steps for each hyperparameter combination
    },
    config={
        "env": "MyTrainingEnv",
        "env_config": config_train,  # The dictionary we built before
        "log_level": "WARNING",
        "framework": "torch",
        "_fake_gpus": False,
        "ignore_worker_failures": True,
        "num_workers": 1,  # One worker per agent. You can increase this but it will run fewer parallel trainings.
        "num_envs_per_worker": 1,
        "num_gpus": 1,  # I yet have to understand if using a GPU is worth it, for our purposes, but I think it's not. This way you can train on a non-gpu enabled system.
        "clip_rewards": True,
        "lr": LEARNING_RATE,  # Hyperparameter grid search defined above
        "gamma": GAMMA,  # This can have a big impact on the result and needs to be properly tuned (range is 0 to 1)
        "lambda": LAMBDA,
        "observation_filter": "MeanStdFilter",
        "model": {
            "fcnet_hiddens": FC_SIZE,  # Hyperparameter grid search defined above
            #"use_attention": True,
            #"attention_use_n_prev_actions": 120,
            #"attention_use_n_prev_rewards": 120
        },
        "sgd_minibatch_size": MINIBATCH_SIZE,  # Hyperparameter grid search defined above
        "evaluation_interval": 1,  # Run evaluation on every iteration
        "evaluation_config": {
            "env_config": config_eval,  # The dictionary we built before (only the overriding keys to use in evaluation)
            "explore": False,  # We don't want to explore during evaluation. All actions have to be repeatable.
        },
    },
    num_samples=1,  # Have one sample for each hyperparameter combination. You can have more to average out randomness.
    keep_checkpoints_num=3,  # Keep the last 2 checkpoints
    checkpoint_freq=1,  # Do a checkpoint on each iteration (slower but you can pick more finely the checkpoint to use later)
    local_dir=r"D:\ray_results"
)

` I encountered the following error:

Traceback (most recent call last):
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\tune\trial_runner.py", line 886, in _process_trial
    results = self.trial_executor.fetch_result(trial)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\tune\ray_trial_executor.py", line 675, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\worker.py", line 1760, in get
    raise value
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, [36mray::PPOTrainer.__init__()[39m (pid=12840, ip=127.0.0.1, repr=PPOTrainer)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\util\tracing\tracing_helper.py", line 451, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\agents\trainer.py", line 948, in _init
    raise NotImplementedError
NotImplementedError

During handling of the above exception, another exception occurred:

[36mray::PPOTrainer.__init__()[39m (pid=12840, ip=127.0.0.1, repr=PPOTrainer)
  File "python\ray\_raylet.pyx", line 633, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 674, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 640, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 644, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 593, in ray._raylet.execute_task.function_executor
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\_private\function_manager.py", line 648, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\util\tracing\tracing_helper.py", line 451, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\agents\trainer.py", line 741, in __init__
    super().__init__(config, logger_creator, remote_checkpoint_dir,
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\tune\trainable.py", line 124, in __init__
    self.setup(copy.deepcopy(self.config))
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\util\tracing\tracing_helper.py", line 451, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\agents\trainer.py", line 846, in setup
    self.workers = self._make_workers(
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\util\tracing\tracing_helper.py", line 451, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\agents\trainer.py", line 1971, in _make_workers
    return WorkerSet(
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 123, in __init__
    self._local_worker = self._make_worker(
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\evaluation\worker_set.py", line 499, in _make_worker
    worker = cls(
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 586, in __init__
    self._build_policy_map(
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1569, in _build_policy_map
    self.policy_map.create_policy(name, orig_cls, obs_space, act_space,
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\policy\policy_map.py", line 143, in create_policy
    self[policy_id] = class_(observation_space, action_space,
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\agents\ppo\ppo_torch_policy.py", line 50, in __init__
    self._initialize_loss_from_dummy_batch()
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\policy\policy.py", line 832, in _initialize_loss_from_dummy_batch
    self.compute_actions_from_input_dict(
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\policy\torch_policy.py", line 294, in compute_actions_from_input_dict
    return self._compute_action_helper(input_dict, state_batches,
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\utils\threading.py", line 21, in wrapper
    return func(self, *a, **k)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\policy\torch_policy.py", line 934, in _compute_action_helper
    dist_inputs, state_out = self.model(input_dict, state_batches,
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\models\modelv2.py", line 243, in __call__
    res = self.forward(restored, state or [], seq_lens)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\models\torch\complex_input_net.py", line 193, in forward
    nn_out, _ = self.flatten[i](SampleBatch({
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\models\modelv2.py", line 243, in __call__
    res = self.forward(restored, state or [], seq_lens)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\models\torch\fcnet.py", line 124, in forward
    self._features = self._hidden_layers(self._last_flat_in)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\torch\nn\modules\container.py", line 141, in forward
    input = module(input)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\ray\rllib\models\torch\misc.py", line 160, in forward
    return self._model(x)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\torch\nn\modules\container.py", line 141, in forward
    input = module(input)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\torch\nn\modules\linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
  File "C:\Users\Usuario\anaconda3\envs\cryptorl\lib\site-packages\torch\nn\functional.py", line 1849, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)

I would expect tensors to be placed on the same device.

Versions / Dependencies

OS: Windows 10 Ray: 2.0.0.dev0 Python: 3.8 Torch: 1.10.1 CUDA: 11.4

Reproduction script

https://www.tensortrade.org/en/latest/examples/train_and_evaluate_using_ray.html

Anything else

No response

Are you willing to submit a PR?

[ ] Yes I am willing to submit a PR!

easysoft2k15 commented 2 years ago

@alexvaca0, I'm running into the same problem. Did You manage to solve it somehow? Thank You

sadimoodi commented 2 years ago

same problem here, any solution yet?

easysoft2k15 commented 2 years ago

I'm not sure yet, but I think the reasons I get this error is because my (custom) environment use a multi dimensional observations space:

self.observation_space=Box(-1.0,1.0,(8,7))

Moving to stable-baselines3, I discovered that most of the algorithms use in rlsupport only flatten spaces (in the library there's a check_env utility function).

I have modify my environment accordingly and in stable-baselines3 now it work just fine.

I suspect that if I test it on ray it will work just as well but I have not tested it yet.

I hope this may help You

sadimoodi commented 2 years ago

I'm not sure yet, but I think the reasons I get this error is because my (custom) environment use a multi dimensional observations space:

self.observation_space=Box(-1.0,1.0,(8,7))

Moving to stable-baselines3, I discovered that most of the algorithms use in rlsupport only flatten spaces (in the library there's a check_env utility function).

I have modify my environment accordingly and in stable-baselines3 now it work just fine.

I suspect that if I test it on ray it will work just as well but I have not tested it yet.

I hope this may help You

@easysoft2k15 you are partially right, in my experiments: Ray does not work with multi dimentional observation spaces, unless you use "conv_filters", as per documentaiton here: the bug we see here is due to Torch moving tensors from GPU to CPU which is causing the issue when you train on CPU and GPU, so when i disabled GPU and trained only on CPU all went well.

evo11x commented 2 years ago

thanks! flattening the observation fixed the problem

smorad commented 2 years ago

Yeah, it seems if you have any observations like gym.spaces.Box(shape=(2, 1)), you will get this error. Making this gym.spaces.Box(shape=(2,)) fixes the problem. IMO this is a very confusing bug for a common use case. Why does the observation space mess with the underlying torch.device? @sven1977 maybe we should assert that all spaces are flattened or something?

bhavithran1 commented 2 years ago

This problem was solved for me by pip install ray[default,tune,rllib,serve]==1.9.2

Hope it helps!

michaelfeil commented 2 years ago

This problem was solved for me by pip install ray[default,tune,rllib,serve]==1.9.2

Hope it helps!

This works, it seems like the built-in ModelV2 had some problems with non-flat observations. In my case, I also got the same error, as I forgot to define the custom_model (-> fallback to the built-in model) in the trainer config. There are a couple of solutions: defining a custom model, which flattens the input space or can handle your multidimensional observations, or writing a Gym.Wrapper for flattening observations.

dan-1d commented 2 years ago

This is a real bug in ray/rllib/models/torch/complex_input_net.py, and has been fixed in the master branch. I independently made the same changes as this commit, and they fixed my problem.

https://github.com/ray-project/ray/commit/a598458c464b88535e711ef7ef55f88e25c1820f

The problem inComplexInputNetwork was that the Torch sub-modules for "one-hot" and "flatten" were not all being registered, so their parameters were not moved to GPU.

timf34 commented 2 years ago

the bug we see here is due to Torch moving tensors from GPU to CPU which is causing the issue when you train on CPU and GPU, so when i disabled GPU and trained only on CPU all went well.

Not using the GPU won't work for me unfortunately (need it for speed)... is there any fix for this which includes being able to use the GPU? Do I need to flatten the observations... if so, how do I do this? Do I flatten them before they're fed to the network... how does this work when I am using CNNs?

ray-project / ray

[Bug] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1 in method wrapper_addmm) #21921