ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32.01k stars 5.45k forks source link

[RLLib] RuntimeError: Expected scalars to be on CPU, got cuda:0 instead #34159

Closed DenysAshikhin closed 1 year ago

DenysAshikhin commented 1 year ago

What happened + What you expected to happen

How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task. Hi all,

I am trying to load in a previously trained model to continue training it, except I get the following error:How severe does this issue affect your experience of using Ray?

High: It blocks me to complete my task. Hi all,

I am trying to load in a previously trained model to continue training it, except I get the following error:

Failure # 1 (occurred at 2023-03-31_14-54-08)
e[36mray::PPO.train()e[39m (pid=5616, ip=127.0.0.1, repr=PPO)
  File "python\ray\_raylet.pyx", line 875, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 879, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 819, in ray._raylet.execute_task.function_executor
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\_private\function_manager.py", line 674, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\tune\trainable\trainable.py", line 384, in train
    raise skipped from exception_cause(skipped)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\tune\trainable\trainable.py", line 381, in train
    result = self.step()
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 794, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\algorithms\algorithm.py", line 2810, in _run_one_training_iteration
    results = self.training_step()
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\algorithms\ppo\ppo.py", line 420, in training_step
    train_results = train_one_step(self, train_batch)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\execution\train_ops.py", line 52, in train_one_step
    info = do_minibatch_sgd(
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\utils\sgd.py", line 129, in do_minibatch_sgd
    local_worker.learn_on_batch(
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1029, in learn_on_batch
    info_out[pid] = policy.learn_on_batch(batch)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\utils\threading.py", line 24, in wrapper
    return func(self, *a, **k)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 663, in learn_on_batch
    self.apply_gradients(_directStepOptimizerSingleton)
  File "C:\personal\ai\ray_venv\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 880, in apply_gradients
    opt.step()
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\optimizer.py", line 280, in wrapper
    out = func(*args, **kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\optimizer.py", line 33, in _use_grad
    ret = func(self, *args, **kwargs)
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\adam.py", line 141, in step
    adam(
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\adam.py", line 281, in adam
    func(params,
  File "C:\personal\ai\ray_venv\lib\site-packages\torch\optim\adam.py", line 449, in _multi_tensor_adam
    torch._foreach_addcmul_(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2)
RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.

Relevant code:

tune.run("PPO",
         resume='AUTO',
         # param_space=config,
         config=ppo_config.to_dict(),
         name=name, keep_checkpoints_num=None, checkpoint_score_attr="episode_reward_mean",
         max_failures=1,
         # restore="C:\\Users\\denys\\ray_results\\mediumbrawl-attention-256Att-128MLP-L2\\PPOTrainer_RandomEnv_1e882_00000_0_2022-06-02_15-13-44\\checkpoint_000028\\checkpoint-28",
         checkpoint_freq=5, checkpoint_at_end=True)

Versions / Dependencies

OS: Win11 Python: 3.10 Ray: latest nightly windows wheel

Reproduction script

n/a

Issue Severity

High: It blocks me from completing my task.

xwjiang2010 commented 1 year ago

Can you share your script?

perduta commented 1 year ago

@xwjiang2010 I think I'm facing same issue, after PBT/PB2 perturbs one of my trials and loads them from checkpoint (loads successfully) next opt.step() seems to be failing. I am trying to debug this - if you have any clues I'm all ears.

OS: Linux 6.1 Python: 3.10 Ray: both on 2.3.1 and latest nightly (1b5b2f8c61)

code that reproduces the issue every time I run it on my setup

import ray
from ray.rllib.algorithms.ppo import PPOConfig
from ray import tune
from ray.tune.schedulers.pb2 import PB2
from ray import air

ray.init(address="auto")

config = (
    PPOConfig()
    .framework("torch")
    .environment("BipedalWalker-v3")
    .training(
        lr=1e-5,
        model={"fcnet_hiddens": [128, 128]},
        train_batch_size=1024,
    )
    .rollouts(num_rollout_workers=5, num_envs_per_worker=4)
    .resources(num_gpus=1.0 / 4)
)

perturbation_interval = 20
pb2 = PB2(
    time_attr="training_iteration",
    perturbation_interval=perturbation_interval,
    hyperparam_bounds={"lr": [1e-3, 1e-7], "train_batch_size": [128, 1024 * 8]},
)

param_space = {**config.to_dict(), **{"checkpoint_interval": perturbation_interval}}

tuner = tune.Tuner(
    "PPO",
    param_space=param_space,
    run_config=air.RunConfig(
        stop={"training_iteration": 1e9},
        verbose=1,
    ),
    tune_config=tune.TuneConfig(
        scheduler=pb2, metric="episode_reward_mean", mode="max", num_samples=4
    ),
)

results = tuner.fit()

piece of logs:

(RolloutWorker pid=189049) 2023-04-08 20:06:19,348  WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(PPO pid=188910) 2023-04-08 20:06:19,348    WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(PPO pid=188910) 2023-04-08 20:06:19,348    WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(RolloutWorker pid=189049) pybullet build time: Apr  4 2023 02:40:04 [repeated 2x across cluster]
== Status ==
Current time: 2023-04-08 20:06:20 (running for 00:03:11.58)
PopulationBasedTraining: 2 checkpoints, 1 perturbs
Logical resource usage: 24.0/24 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)
Current best trial: 9e53a_00003 with episode_reward_mean=-809.2682569878904 and parameters={'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0.25, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, '_fake_gpus': False, 'num_learner_workers': 0, 'num_gpus_per_learner_worker': 0, 'num_cpus_per_learner_worker': 1, 'local_gpu_idx': 0, 'custom_resources_per_worker': {}, 'placement_strategy': 'PACK', 'eager_tracing': False, 'eager_max_retraces': 20, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'env': 'quadcopter-env-v0', 'env_config': {}, 'observation_space': None, 'action_space': None, 'env_task_fn': None, 'render_env': False, 'clip_rewards': None, 'normalize_actions': True, 'clip_actions': False, 'disable_env_checking': False, 'is_atari': False, 'auto_wrap_old_gym_envs': True, 'num_envs_per_worker': 4, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'sample_async': False, 'enable_connectors': True, 'rollout_fragment_length': 'auto', 'batch_mode': 'truncate_episodes', 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'validate_workers_after_construction': True, 'preprocessor_pref': 'deepmind', 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'compress_observations': False, 'enable_tf1_exec_eagerly': False, 'sampler_perf_stats_ema_coef': None, 'gamma': 0.99, 'lr': 1e-05, 'train_batch_size': 1024, 'model': {'_disable_preprocessor_api': False, '_disable_action_flattening': False, 'fcnet_hiddens': [128, 128], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'encoder_latent_dim': None, 'lstm_use_prev_action_reward': -1, '_use_default_native_models': -1}, 'optimizer': {}, 'max_requests_in_flight_per_sampler_worker': 2, 'learner_class': None, '_enable_learner_api': False, '_learner_hps': PPOLearnerHPs(kl_coeff=0.2, kl_target=0.01, use_critic=True, clip_param=0.3, vf_clip_param=10.0, entropy_coeff=0.0, vf_loss_coeff=1.0, lr_schedule=None, entropy_coeff_schedule=None), 'explore': True, 'exploration_config': {'type': 'StochasticSampling'}, 'policy_states_are_swappable': False, 'input_config': {}, 'actions_in_input_normalized': False, 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_config': {}, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'offline_sampling': False, 'evaluation_interval': None, 'evaluation_duration': 10, 'evaluation_duration_unit': 'episodes', 'evaluation_sample_timeout_s': 180.0, 'evaluation_parallel_to_training': False, 'evaluation_config': None, 'off_policy_estimation_methods': {}, 'ope_split_batch_by_episode': True, 'evaluation_num_workers': 0, 'always_attach_evaluation_results': False, 'enable_async_evaluation': False, 'in_evaluation': False, 'sync_filters_on_rollout_workers_timeout_s': 60.0, 'keep_per_episode_custom_metrics': False, 'metrics_episode_collection_timeout_s': 60.0, 'metrics_num_episodes_for_smoothing': 100, 'min_time_s_per_iteration': None, 'min_train_timesteps_per_iteration': 0, 'min_sample_timesteps_per_iteration': 0, 'export_native_model_files': False, 'checkpoint_trainable_policies_only': False, 'logger_creator': None, 'logger_config': None, 'log_level': 'WARN', 'log_sys_usage': True, 'fake_sampler': False, 'seed': None, 'worker_cls': None, 'ignore_worker_failures': False, 'recreate_failed_workers': False, 'max_num_worker_restarts': 1000, 'delay_between_worker_restarts_s': 60.0, 'restart_failed_sub_environments': False, 'num_consecutive_worker_failures_tolerance': 100, 'worker_health_probe_timeout_s': 60, 'worker_restore_timeout_s': 1800, 'rl_module_spec': None, '_enable_rl_module_api': False, '_validate_exploration_conf_and_rl_modules': True, '_tf_policy_handles_more_than_one_loss': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, '_disable_execution_plan_api': True, 'simple_optimizer': False, 'replay_sequence_length': None, 'horizon': -1, 'soft_horizon': -1, 'no_done_at_end': -1, 'lr_schedule': None, 'use_critic': True, 'use_gae': True, 'kl_coeff': 0.2, 'sgd_minibatch_size': 128, 'num_sgd_iter': 30, 'shuffle_sequences': True, 'vf_loss_coeff': 1.0, 'entropy_coeff': 0.0, 'entropy_coeff_schedule': None, 'clip_param': 0.3, 'vf_clip_param': 10.0, 'grad_clip': None, 'kl_target': 0.01, 'vf_share_layers': -1, 'checkpoint_interval': 20, '__stdout_file__': None, '__stderr_file__': None, 'lambda': 1.0, 'input': 'sampler', 'multiagent': {'policies': {'default_policy': (None, None, None, None)}, 'policy_mapping_fn': <function AlgorithmConfig.DEFAULT_POLICY_MAPPING_FN at 0x7f4db05ea830>, 'policies_to_train': None, 'policy_map_capacity': 100, 'policy_map_cache': -1, 'count_steps_by': 'env_steps', 'observation_fn': None}, 'callbacks': <class 'ray.rllib.algorithms.callbacks.DefaultCallbacks'>, 'create_env_on_driver': False, 'custom_eval_function': None, 'framework': 'torch', 'num_cpus_for_driver': 1, 'num_workers': 5}
Result logdir: /home/pp/ray_results/PPO
Number of trials: 4/4 (4 RUNNING)

(PPO pid=188910) 2023-04-08 20:06:20,350    WARNING checkpoints.py:109 -- No `rllib_checkpoint.json` file found in checkpoint directory /tmp/checkpoint_tmp_a68e78c04f114e399acd38a05eff6631/.! Trying to extract checkpoint info from other files found in that dir.
(PPO pid=188910) 2023-04-08 20:06:20,391    INFO trainable.py:915 -- Restored on 192.168.178.20 from checkpoint: /tmp/checkpoint_tmp_a68e78c04f114e399acd38a05eff6631
(PPO pid=188910) 2023-04-08 20:06:20,392    INFO trainable.py:924 -- Current state after restoring: {'_iteration': 80, '_timesteps_total': None, '_time_total': 146.15634059906006, '_episodes_total': 155}
2023-04-08 20:06:20,958 ERROR trial_runner.py:1485 -- Trial PPO_quadcopter-env-v0_9e53a_00003: Error happened when processing _ExecutorEventType.TRAINING_RESULT.
ray.exceptions.RayTaskError(RuntimeError): ray::PPO.train() (pid=188912, ip=192.168.178.20, repr=PPO)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 386, in train
    raise skipped from exception_cause(skipped)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 383, in train
    result = self.step()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 792, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2811, in _run_one_training_iteration
    results = self.training_step()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 432, in training_step
    train_results = multi_gpu_train_one_step(self, train_batch)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/execution/train_ops.py", line 163, in multi_gpu_train_one_step
    results = policy.learn_on_loaded_batch(
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 825, in learn_on_loaded_batch
    self.apply_gradients(_directStepOptimizerSingleton)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 885, in apply_gradients
    opt.step()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper
    out = func(*args, **kwargs)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 33, in _use_grad
    ret = func(self, *args, **kwargs)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 141, in step
    adam(
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 281, in adam
    func(params,
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 449, in _multi_tensor_adam
    torch._foreach_addcmul_(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2)
RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.
DenysAshikhin commented 1 year ago

@xwjiang2010 @perduta

I think I have a lead on this, I'll do some more testing but I think it has to do with the num_gpu and roll_out_workers and how they are set. When I initially ran the tuner.run I hadn't set the gpu resources correct so it trained on the cpu instead. Then later I fixed it and went back to load an older checkpoint (that had the improper gpu set) and it tried loading it onto the gpu instead of cpu causing that issue. I'll see if I can recreate it more consistently.

xwjiang2010 commented 1 year ago

Actually are people all running into this issue with RLlib Algorithms? It may have something to do with how Algorithm save and load checkpoints (basically they should be consistent - whether both on gpu or both on cpu).

cc @kouroshHakha

related: https://discuss.ray.io/t/runtimeerror-expected-scalars-to-be-on-cpu-got-cuda-0-instead/9998

perduta commented 1 year ago

Actually are people all running into this issue with RLlib Algorithms? It may have something to do with how Algorithm save and load checkpoints (basically they should be consistent - whether both on gpu or both on cpu).

cc @kouroshHakha

I've reproduced this with both PPO and SAC, didn't check the rest.

WeihaoTan commented 1 year ago

I also met the same issue when restoring and training a PPO agent with ray 2.3.1. Is there any temporary solution? I used run_experiment() to train.

kouroshHakha commented 1 year ago

Hey @DenysAshikhin, I don't have much visibility into this issue, but if you can create a minimal repro script that we can use to debug it would be a great starting point.

For examples some script that trains a PPO agent on cuda for one iteration, and then tries to restore it (for inference or continuation of training) but it fails with the error message you showed.

Thanks.

xwjiang2010 commented 1 year ago

also tagging @perduta @WeihaoTan to provide a repro script. Thanks!

WeihaoTan commented 1 year ago

Hi @kouroshHakha @xwjiang2010 Here is the repro script. If you run it using a machine with 1 GPU. It works perfectly. If you run it using a machine with multiple GPUs. The bug will appear. train.py

import argparse
import yaml

import ray
from ray.tune.experiment.config_parser import _make_parser
from ray.tune.progress_reporter import CLIReporter
from ray.tune.tune import run_experiments
from ray.tune.registry import register_trainable, register_env
from ray.tune.schedulers import create_scheduler
from ray.rllib.models import ModelCatalog
from ray.rllib.utils.framework import try_import_torch

from algorithms.registry import ALGORITHMS, get_algorithm_class
from envs.registry import ENVIRONMENTS, get_env_class, POLICY_MAPPINGS, CALLBACKS
from models.registry import MODELS, get_model_class, ACTION_DISTS, get_action_dist_class

EXAMPLE_USAGE = """
python train.py -f config.yaml
"""

# Try to import both backends for flag checking/warnings.
torch, _ = try_import_torch()

def create_parser(parser_creator=None):
    parser = _make_parser(
        parser_creator=parser_creator,
        formatter_class=argparse.RawDescriptionHelpFormatter,
        description="Train a reinforcement learning agent.",
        epilog=EXAMPLE_USAGE,
    )

    # See also the base parser definition in ray/tune/experiment/__config_parser.py
    parser.add_argument(
        "--ray-address",
        default=None,
        type=str,
        help="Connect to an existing Ray cluster at this address instead "
        "of starting a new one.",
    )
    parser.add_argument(
        "--ray-ui", action="store_true", help="Whether to enable the Ray web UI."
    )
    parser.add_argument(
        "--local-mode",
        action="store_true",
        help="Run ray in local mode for easier debugging.",
    )
    parser.add_argument(
        "--ray-num-cpus",
        default=None,
        type=int,
        help="--num-cpus to use if starting a new cluster.",
    )
    parser.add_argument(
        "--ray-num-gpus",
        default=None,
        type=int,
        help="--num-gpus to use if starting a new cluster.",
    )
    parser.add_argument(
        "--ray-num-nodes",
        default=None,
        type=int,
        help="Emulate multiple cluster nodes for debugging.",
    )
    parser.add_argument(
        "--ray-object-store-memory",
        default=None,
        type=int,
        help="--object-store-memory to use if starting a new cluster.",
    )
    parser.add_argument(
        "--resume",
        action="store_true",
        help="Whether to attempt to resume previous Tune experiments.",
    )
    parser.add_argument(
        "-f",
        "--config-file",
        default="config.yaml",
        type=str,
        help="If specified, use config options from this file. Note that this "
             "overrides any trial-specific options set via flags above.",
    )

    return parser

def run(args, parser):
    assert args.config_file is not None, "Must specify a config file"
    with open(args.config_file) as f:
        experiments = yaml.safe_load(f)
    verbose = 1
    for exp in experiments.values():
        metric_columns = exp.pop("metric_columns", None)
        if not exp.get("run"):
            parser.error("the following arguments are required: --run")
        if not exp.get("env") and not exp.get("config", {}).get("env"):
            parser.error("the following arguments are required: --env")
        if exp["config"].get("multiagent"):
            policy_mapping_name = exp["config"]["multiagent"].get("policy_mapping_fn")
            if isinstance(policy_mapping_name, str):
                exp["config"]["multiagent"]["policy_mapping_fn"] = POLICY_MAPPINGS[policy_mapping_name]
        if exp["config"].get("callbacks"):
            calback_name = exp["config"].get("callbacks")
            if isinstance(calback_name, str):
                exp["config"]["callbacks"] = CALLBACKS[calback_name]

    if args.ray_num_nodes:
        from ray.cluster_utils import Cluster

        cluster = Cluster()
        for _ in range(args.ray_num_nodes):
            cluster.add_node(
                num_cpus=args.ray_num_cpus or 1,
                num_gpus=args.ray_num_gpus or 0,
                object_store_memory=args.ray_object_store_memory,
            )
        ray.init(address=cluster.address)
    else:
        ray.init(
            include_dashboard=args.ray_ui,
            address=args.ray_address,
            object_store_memory=args.ray_object_store_memory,
            num_cpus=args.ray_num_cpus,
            num_gpus=args.ray_num_gpus,
            local_mode=args.local_mode,
        )

    progress_reporter = CLIReporter(
        print_intermediate_tables=verbose >= 1,
        metric_columns=metric_columns,
    )

    trials = run_experiments(
        experiments,
        scheduler=create_scheduler(args.scheduler, **args.scheduler_config),
        resume=args.resume,
        verbose=verbose,
        progress_reporter=progress_reporter,
        concurrent=True,
    )
    ray.shutdown()

    checkpoints = []
    for trial in trials:
        if trial.checkpoint.dir_or_data:
            checkpoints.append(trial.checkpoint.dir_or_data)

    if checkpoints:
        from rich import print
        from rich.panel import Panel

        print("\nYour training finished.")

        print("Best available checkpoint for each trial:")
        for cp in checkpoints:
            print(f"  {cp}")

        print(
            "\nYou can now evaluate your trained algorithm from any "
            "checkpoint, e.g. by running:"
        )
        print(Panel(f"[green]  rllib evaluate {checkpoints[0]} "))

def main():
    parser = create_parser()
    args = parser.parse_args()
    run(args, parser)

if __name__ == "__main__":
    main()

config.yaml

test:
  run: PPO
  checkpoint_config:
    checkpoint_frequency: 50
    checkpoint_at_end: true
    num_to_keep: 5
  local_dir: ray_results
  stop:
    timesteps_total: 1000
  #restore: checkpoint

  config:
    framework: torch 
    env: CartPole-v1

    num_workers: 4
    num_cpus_for_driver: 1
    num_envs_per_worker: 1
    num_cpus_per_worker: 2
    num_gpus: 1

    disable_env_checking: true

After training is finished, uncomment "restore" and put the trained checkpoint path in it. Increase timesteps_total and re-run train.py, the bug will appear. Something like:

    torch._foreach_addcmul_(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2)
RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.
perduta commented 1 year ago

@xwjiang2010 I think I'm facing same issue, after PBT/PB2 perturbs one of my trials and loads them from checkpoint (loads successfully) next opt.step() seems to be failing. I am trying to debug this - if you have any clues I'm all ears.

OS: Linux 6.1 Python: 3.10 Ray: both on 2.3.1 and latest nightly (1b5b2f8)

code that reproduces the issue every time I run it on my setup

import ray
from ray.rllib.algorithms.ppo import PPOConfig
from ray import tune
from ray.tune.schedulers.pb2 import PB2
from ray import air

ray.init(address="auto")

config = (
    PPOConfig()
    .framework("torch")
    .environment("BipedalWalker-v3")
    .training(
        lr=1e-5,
        model={"fcnet_hiddens": [128, 128]},
        train_batch_size=1024,
    )
    .rollouts(num_rollout_workers=5, num_envs_per_worker=4)
    .resources(num_gpus=1.0 / 4)
)

perturbation_interval = 20
pb2 = PB2(
    time_attr="training_iteration",
    perturbation_interval=perturbation_interval,
    hyperparam_bounds={"lr": [1e-3, 1e-7], "train_batch_size": [128, 1024 * 8]},
)

param_space = {**config.to_dict(), **{"checkpoint_interval": perturbation_interval}}

tuner = tune.Tuner(
    "PPO",
    param_space=param_space,
    run_config=air.RunConfig(
        stop={"training_iteration": 1e9},
        verbose=1,
    ),
    tune_config=tune.TuneConfig(
        scheduler=pb2, metric="episode_reward_mean", mode="max", num_samples=4
    ),
)

results = tuner.fit()

piece of logs:

(RolloutWorker pid=189049) 2023-04-08 20:06:19,348    WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(PPO pid=188910) 2023-04-08 20:06:19,348  WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(PPO pid=188910) 2023-04-08 20:06:19,348  WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(RolloutWorker pid=189049) pybullet build time: Apr  4 2023 02:40:04 [repeated 2x across cluster]
== Status ==
Current time: 2023-04-08 20:06:20 (running for 00:03:11.58)
PopulationBasedTraining: 2 checkpoints, 1 perturbs
Logical resource usage: 24.0/24 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)
Current best trial: 9e53a_00003 with episode_reward_mean=-809.2682569878904 and parameters={'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0.25, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, '_fake_gpus': False, 'num_learner_workers': 0, 'num_gpus_per_learner_worker': 0, 'num_cpus_per_learner_worker': 1, 'local_gpu_idx': 0, 'custom_resources_per_worker': {}, 'placement_strategy': 'PACK', 'eager_tracing': False, 'eager_max_retraces': 20, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'env': 'quadcopter-env-v0', 'env_config': {}, 'observation_space': None, 'action_space': None, 'env_task_fn': None, 'render_env': False, 'clip_rewards': None, 'normalize_actions': True, 'clip_actions': False, 'disable_env_checking': False, 'is_atari': False, 'auto_wrap_old_gym_envs': True, 'num_envs_per_worker': 4, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'sample_async': False, 'enable_connectors': True, 'rollout_fragment_length': 'auto', 'batch_mode': 'truncate_episodes', 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'validate_workers_after_construction': True, 'preprocessor_pref': 'deepmind', 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'compress_observations': False, 'enable_tf1_exec_eagerly': False, 'sampler_perf_stats_ema_coef': None, 'gamma': 0.99, 'lr': 1e-05, 'train_batch_size': 1024, 'model': {'_disable_preprocessor_api': False, '_disable_action_flattening': False, 'fcnet_hiddens': [128, 128], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'encoder_latent_dim': None, 'lstm_use_prev_action_reward': -1, '_use_default_native_models': -1}, 'optimizer': {}, 'max_requests_in_flight_per_sampler_worker': 2, 'learner_class': None, '_enable_learner_api': False, '_learner_hps': PPOLearnerHPs(kl_coeff=0.2, kl_target=0.01, use_critic=True, clip_param=0.3, vf_clip_param=10.0, entropy_coeff=0.0, vf_loss_coeff=1.0, lr_schedule=None, entropy_coeff_schedule=None), 'explore': True, 'exploration_config': {'type': 'StochasticSampling'}, 'policy_states_are_swappable': False, 'input_config': {}, 'actions_in_input_normalized': False, 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_config': {}, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'offline_sampling': False, 'evaluation_interval': None, 'evaluation_duration': 10, 'evaluation_duration_unit': 'episodes', 'evaluation_sample_timeout_s': 180.0, 'evaluation_parallel_to_training': False, 'evaluation_config': None, 'off_policy_estimation_methods': {}, 'ope_split_batch_by_episode': True, 'evaluation_num_workers': 0, 'always_attach_evaluation_results': False, 'enable_async_evaluation': False, 'in_evaluation': False, 'sync_filters_on_rollout_workers_timeout_s': 60.0, 'keep_per_episode_custom_metrics': False, 'metrics_episode_collection_timeout_s': 60.0, 'metrics_num_episodes_for_smoothing': 100, 'min_time_s_per_iteration': None, 'min_train_timesteps_per_iteration': 0, 'min_sample_timesteps_per_iteration': 0, 'export_native_model_files': False, 'checkpoint_trainable_policies_only': False, 'logger_creator': None, 'logger_config': None, 'log_level': 'WARN', 'log_sys_usage': True, 'fake_sampler': False, 'seed': None, 'worker_cls': None, 'ignore_worker_failures': False, 'recreate_failed_workers': False, 'max_num_worker_restarts': 1000, 'delay_between_worker_restarts_s': 60.0, 'restart_failed_sub_environments': False, 'num_consecutive_worker_failures_tolerance': 100, 'worker_health_probe_timeout_s': 60, 'worker_restore_timeout_s': 1800, 'rl_module_spec': None, '_enable_rl_module_api': False, '_validate_exploration_conf_and_rl_modules': True, '_tf_policy_handles_more_than_one_loss': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, '_disable_execution_plan_api': True, 'simple_optimizer': False, 'replay_sequence_length': None, 'horizon': -1, 'soft_horizon': -1, 'no_done_at_end': -1, 'lr_schedule': None, 'use_critic': True, 'use_gae': True, 'kl_coeff': 0.2, 'sgd_minibatch_size': 128, 'num_sgd_iter': 30, 'shuffle_sequences': True, 'vf_loss_coeff': 1.0, 'entropy_coeff': 0.0, 'entropy_coeff_schedule': None, 'clip_param': 0.3, 'vf_clip_param': 10.0, 'grad_clip': None, 'kl_target': 0.01, 'vf_share_layers': -1, 'checkpoint_interval': 20, '__stdout_file__': None, '__stderr_file__': None, 'lambda': 1.0, 'input': 'sampler', 'multiagent': {'policies': {'default_policy': (None, None, None, None)}, 'policy_mapping_fn': <function AlgorithmConfig.DEFAULT_POLICY_MAPPING_FN at 0x7f4db05ea830>, 'policies_to_train': None, 'policy_map_capacity': 100, 'policy_map_cache': -1, 'count_steps_by': 'env_steps', 'observation_fn': None}, 'callbacks': <class 'ray.rllib.algorithms.callbacks.DefaultCallbacks'>, 'create_env_on_driver': False, 'custom_eval_function': None, 'framework': 'torch', 'num_cpus_for_driver': 1, 'num_workers': 5}
Result logdir: /home/pp/ray_results/PPO
Number of trials: 4/4 (4 RUNNING)

(PPO pid=188910) 2023-04-08 20:06:20,350  WARNING checkpoints.py:109 -- No `rllib_checkpoint.json` file found in checkpoint directory /tmp/checkpoint_tmp_a68e78c04f114e399acd38a05eff6631/.! Trying to extract checkpoint info from other files found in that dir.
(PPO pid=188910) 2023-04-08 20:06:20,391  INFO trainable.py:915 -- Restored on 192.168.178.20 from checkpoint: /tmp/checkpoint_tmp_a68e78c04f114e399acd38a05eff6631
(PPO pid=188910) 2023-04-08 20:06:20,392  INFO trainable.py:924 -- Current state after restoring: {'_iteration': 80, '_timesteps_total': None, '_time_total': 146.15634059906006, '_episodes_total': 155}
2023-04-08 20:06:20,958   ERROR trial_runner.py:1485 -- Trial PPO_quadcopter-env-v0_9e53a_00003: Error happened when processing _ExecutorEventType.TRAINING_RESULT.
ray.exceptions.RayTaskError(RuntimeError): ray::PPO.train() (pid=188912, ip=192.168.178.20, repr=PPO)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 386, in train
    raise skipped from exception_cause(skipped)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 383, in train
    result = self.step()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 792, in step
    results, train_iter_ctx = self._run_one_training_iteration()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2811, in _run_one_training_iteration
    results = self.training_step()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 432, in training_step
    train_results = multi_gpu_train_one_step(self, train_batch)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/execution/train_ops.py", line 163, in multi_gpu_train_one_step
    results = policy.learn_on_loaded_batch(
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 825, in learn_on_loaded_batch
    self.apply_gradients(_directStepOptimizerSingleton)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 885, in apply_gradients
    opt.step()
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper
    out = func(*args, **kwargs)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 33, in _use_grad
    ret = func(self, *args, **kwargs)
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 141, in step
    adam(
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 281, in adam
    func(params,
  File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 449, in _multi_tensor_adam
    torch._foreach_addcmul_(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2)
RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.

@xwjiang2010 is this enough? if not, what else should I attach? Should we attach bug label back? I don't know any way to workaround this and I am unable to train any RL model at this time using ray.

DenysAshikhin commented 1 year ago

For my side (doesn't even require multiple gpus):

import ray
from ray.rllib.env import PolicyServerInput
from ray.rllib.algorithms.ppo import PPOConfig

import numpy as np
import argparse
from gymnasium.spaces import MultiDiscrete, Box

ray.init(num_cpus=9, num_gpus=1, log_to_driver=False, configure_logging=False)

ppo_config = PPOConfig()

parser = argparse.ArgumentParser(description='Optional app description')
parser.add_argument('-ip', type=str, help='IP of this device')

parser.add_argument('-checkpoint', type=str, help='location of checkpoint to restore from')

args = parser.parse_args()

def _input(ioctx):
    # We are remote worker, or we are local worker with num_workers=0:
    # Create a PolicyServerInput.
    if ioctx.worker_index > 0 or ioctx.worker.num_workers == 0:
        return PolicyServerInput(
            ioctx,
            args.ip,
            55556 + ioctx.worker_index - (1 if ioctx.worker_index > 0 else 0),
        )
    # No InputReader (PolicyServerInput) needed.
    else:
        return None

x = 320
y = 240

# kl_coeff, ->default 0.2
# ppo_config.gamma = 0.01  # vf_loss_coeff used to be 0.01??
# "entropy_coeff": 0.00005,
# "clip_param": 0.1,
ppo_config.gamma = 0.998  # default 0.99
ppo_config.lambda_ = 0.99  # default 1.0???
ppo_config.kl_target = 0.01  # default 0.01
ppo_config.rollout_fragment_length = 128
# ppo_config.train_batch_size = 8500
# ppo_config.train_batch_size = 10000
ppo_config.train_batch_size = 12000
ppo_config.sgd_minibatch_size = 512
# ppo_config.num_sgd_iter = 2  # default 30???
ppo_config.num_sgd_iter = 7  # default 30???
# ppo_config.lr = 3.5e-5  # 5e-5
ppo_config.lr = 9e-5  # 5e-5

ppo_config.model = {
    # Share layers for value function. If you set this to True, it's
    # important to tune vf_loss_coeff.
    "vf_share_layers": True,

    "use_lstm": True,
    "max_seq_len": 32,
    "lstm_cell_size": 128,
    "lstm_use_prev_action": True,

    "conv_filters": [

        # 240 X 320
        [16, [5, 5], 3],
        [32, [5, 5], 3],
        [64, [5, 5], 3],
        [128, [3, 3], 2],
        [256, [3, 3], 2],
        [512, [3, 3], 2],
    ],
    "conv_activation": "relu",
    "post_fcnet_hiddens": [512],
    "post_fcnet_activation": "relu"
}
ppo_config.batch_mode = "complete_episodes"
ppo_config.simple_optimizer = True

# ppo_config["remote_worker_envs"] = True

ppo_config.env = None
ppo_config.observation_space = Box(low=0, high=1, shape=(y, x, 1), dtype=np.float32)
ppo_config.action_space = MultiDiscrete(
    [
        2,  # W
        2,  # A
        2,  # S
        2,  # D
        2,  # Space
        2,  # H
        2,  # J
        2,  # K
        2  # L
    ]
)
ppo_config.env_config = {
    "sleep": True,
    'replayOn': False
}

ppo_config.rollouts(num_rollout_workers=2, enable_connectors=False)
ppo_config.offline_data(input_=_input)

ppo_config.framework_str = 'torch'
ppo_config.log_sys_usage = False
ppo_config.compress_observations = True
ppo_config.shuffle_sequences = False
ppo_config.num_gpus = 0.35
ppo_config.num_gpus_per_worker = 0.1
ppo_config.num_cpus_per_worker = 2
ppo_config.num_cpus_per_learner_worker = 2
ppo_config.num_gpus_per_learner_worker = 0.35

tempyy = ppo_config.to_dict()

print(tempyy)

from ray import tune

name = "" + args.checkpoint
print(f"Starting: {name}")

tune.run("PPO",
         resume='AUTO',
         # param_space=config,
         config=tempyy,
         name=name, keep_checkpoints_num=None, checkpoint_score_attr="episode_reward_mean",
         max_failures=1,
         checkpoint_freq=5, checkpoint_at_end=True)

You can of course subsitute the env with a random env. In my case, after letting it make some checkpoints, and then ctr+c in the cmd window. After it gracefully saves, running it again gives the same error as outlined above

perduta commented 1 year ago

My issue is occurring on a single GPU machine as well.

On Thu, Apr 13, 2023, 14:35 Denys Ashikhin @.***> wrote:

For my side (doesn't even require multiple gpus):

import ray from ray.rllib.env import PolicyServerInput from ray.rllib.algorithms.ppo import PPOConfig

import numpy as np import argparse from gymnasium.spaces import MultiDiscrete, Box

ray.init(num_cpus=9, num_gpus=1, log_to_driver=False, configure_logging=False)

ppo_config = PPOConfig()

parser = argparse.ArgumentParser(description='Optional app description') parser.add_argument('-ip', type=str, help='IP of this device')

parser.add_argument('-checkpoint', type=str, help='location of checkpoint to restore from')

args = parser.parse_args()

def _input(ioctx):

We are remote worker, or we are local worker with num_workers=0:

# Create a PolicyServerInput.
if ioctx.worker_index > 0 or ioctx.worker.num_workers == 0:
    return PolicyServerInput(
        ioctx,
        args.ip,
        55556 + ioctx.worker_index - (1 if ioctx.worker_index > 0 else 0),
    )
# No InputReader (PolicyServerInput) needed.
else:
    return None

x = 320 y = 240

kl_coeff, ->default 0.2

ppo_config.gamma = 0.01 # vf_loss_coeff used to be 0.01??

"entropy_coeff": 0.00005,

"clip_param": 0.1,

ppo_config.gamma = 0.998 # default 0.99 ppoconfig.lambda = 0.99 # default 1.0??? ppo_config.kl_target = 0.01 # default 0.01 ppo_config.rollout_fragment_length = 128

ppo_config.train_batch_size = 8500

ppo_config.train_batch_size = 10000

ppo_config.train_batch_size = 12000 ppo_config.sgd_minibatch_size = 512

ppo_config.num_sgd_iter = 2 # default 30???

ppo_config.num_sgd_iter = 7 # default 30???

ppo_config.lr = 3.5e-5 # 5e-5ppo_config.lr = 9e-5 # 5e-5

ppo_config.model = {

Share layers for value function. If you set this to True, it's

# important to tune vf_loss_coeff.
"vf_share_layers": True,

"use_lstm": True,
"max_seq_len": 32,
"lstm_cell_size": 128,
"lstm_use_prev_action": True,

"conv_filters": [

    # 240 X 320
    [16, [5, 5], 3],
    [32, [5, 5], 3],
    [64, [5, 5], 3],
    [128, [3, 3], 2],
    [256, [3, 3], 2],
    [512, [3, 3], 2],
],
"conv_activation": "relu",
"post_fcnet_hiddens": [512],
"post_fcnet_activation": "relu"

} ppo_config.batch_mode = "complete_episodes" ppo_config.simple_optimizer = True

ppo_config["remote_worker_envs"] = True

ppo_config.env = None ppo_config.observation_space = Box(low=0, high=1, shape=(y, x, 1), dtype=np.float32) ppo_config.action_space = MultiDiscrete( [ 2, # W 2, # A 2, # S 2, # D 2, # Space 2, # H 2, # J 2, # K 2 # L ] ) ppo_config.env_config = { "sleep": True, 'replayOn': False }

ppo_config.rollouts(num_rollout_workers=2, enable_connectors=False) ppo_config.offlinedata(input=_input)

ppo_config.framework_str = 'torch' ppo_config.log_sys_usage = False ppo_config.compress_observations = True ppo_config.shuffle_sequences = False ppo_config.num_gpus = 0.5 ppo_config.num_gpus_per_worker = 0.25 ppo_config.num_cpus_per_worker = 2 ppo_config.num_cpus_per_learner_worker = 2 ppo_config.num_gpus_per_learner_worker = 0.5

tempyy = ppo_config.to_dict()

print(tempyy)

from ray import tune

name = "" + args.checkpoint print(f"Starting: {name}")

tune.run("PPO", resume='AUTO',

param_space=config,

     config=tempyy,
     name=name, keep_checkpoints_num=None, checkpoint_score_attr="episode_reward_mean",
     max_failures=1,
     checkpoint_freq=5, checkpoint_at_end=True)
     ```

     You can of course subsitute the env with a random env. In my case, after letting it make some checkpoints, and then ctr+c in the cmd window. After it gracefully saves, running it again gives the same error as outlined above

— Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/34159#issuecomment-1506888297, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEMC3OYRFMYMNWOVS32N5FTXA7XKBANCNFSM6AAAAAAWWUYRZE . You are receiving this because you were mentioned.Message ID: @.***>

cheadrian commented 1 year ago

Got the same problem when trying to resume an experiment on Colab with exactly the same configuration as before.

tuner = Tuner.restore("saved_models", trainer, resume_unfinished = True, resume_errored=True)
tuner.fit()

at the end of iteration. Single GPU setup. Python 3.9.16, torch 2.0.0+cu118, ray 2.3.1

xwjiang2010 commented 1 year ago

@cheadrian what is the trainer in your case?

cheadrian commented 1 year ago
trainer = RLTrainer(
    run_config=run_config,
    scaling_config=ScalingConfig(
        num_workers=2, use_gpu=True,
        trainer_resources={"CPU": 0.0}, 
        resources_per_worker={"CPU": 1.0}),
    algorithm="PPO",
    config=config_cf,
    resume_from_checkpoint=rl_checkpoint,
)

I get the same error with or without specifying it.

solnox99 commented 1 year ago

안녕하세요.

sac을 사용하고 있는데 tune으로 학습한 모델을 Algorithm.from_checkpoint()로 불러와서 쓰려니 저도 똑같은 에러가 나네요 ㅠㅠ

DenysAshikhin commented 1 year ago

kouroshHakha

Please ignore that point, no matter what settings I use the issue happens same as for the others. I had confused some parameters 😅

gunewar commented 1 year ago

same error and no solution yet ???? python3.9/site-packages/torch/optim/adam.py", line 449, in _multi_tensor_adam torch._foreachaddcmul(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2) RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.

xwjiang2010 commented 1 year ago

@kouroshHakha Could you triage for RL team?

DenysAshikhin commented 1 year ago

@gunewar I'm sure the team is aware of the issue at this point - let's just give them some time to find a way to reproduce this on their end at which point it should be a simple fix (hopefully).

I get the feeling that it's some new configuration that got missed in automated builds

woosangbum commented 1 year ago

안녕하세요.

sac을 사용하고 있는데 tune으로 학습한 모델을 Algorithm.from_checkpoint()로 불러와서 쓰려니 저도 똑같은 에러가 나네요 ㅠㅠ

config에서 num gpus를 제거하니 일단 학습은 됩니다! :)

I remove numgpus from config, so it's learning for now! :)

DenysAshikhin commented 1 year ago

안녕하세요. sac을 사용하고 있는데 tune으로 학습한 모델을 Algorithm.from_checkpoint()로 불러와서 쓰려니 저도 똑같은 에러가 나네요 ㅠㅠ

config에서 num gpus를 제거하니 일단 학습은 됩니다! :)

I remove numgpus from config, so it's learning for now! :)

So you trained initially with num_gpus (thus training on the gpu) and for subsequent runs you turned off the num_gpu? Is it training on your cpu or gpu then?

woosangbum commented 1 year ago

안녕하세요. sac을 사용하고 있는데 tune으로 학습한 모델을 Algorithm.from_checkpoint()로 불러와서 쓰려니 저도 똑같은 에러가 나네요 ㅠㅠ

config에서 num gpus를 제거하니 일단 학습은 됩니다! :) I remove numgpus from config, so it's learning for now! :)

So you trained initially with num_gpus (thus training on the gpu) and for subsequent runs you turned off the num_gpu? Is it training on your cpu or gpu then?

I was able to successfully restore the model that was trained with the num_gpus(=1) config after removing the num_gpus(=0) config. After doing this, there were no errors and it was possible to continue training the model.

DenysAshikhin commented 1 year ago

Interesting, didn't work for me. Can you confirm if it's training on your GPU still or got switched to your CPU after restoring?

woosangbum commented 1 year ago

Interesting, didn't work for me. Can you confirm if it's training on your GPU still or got switched to your CPU after restoring?

I just checked the training, and now I'm working on another project.

I used all the configs related to resources as the base and didn't use Ray Tuner

cheadrian commented 1 year ago

Got the same problem when trying to resume an experiment on Colab with exactly the same configuration as before.

tuner = Tuner.restore("saved_models", trainer, resume_unfinished = True, resume_errored=True)
tuner.fit()

at the end of iteration. Single GPU setup. Python 3.9.16, torch 2.0.0+cu118, ray 2.3.1

trainer = RLTrainer(
    run_config=run_config,
    scaling_config=ScalingConfig(
        num_workers=2, use_gpu=True,
        trainer_resources={"CPU": 0.0}, 
        resources_per_worker={"CPU": 1.0}),
    algorithm="PPO",
    config=config_cf,
    resume_from_checkpoint=rl_checkpoint,
)

Tune Status Resources requested: 2.0/2 CPUs, 1.0/1 GPUs, 0.0/10.85 GiB heap, 0.0/0.19 GiB objects

Modifying the use_gpu=True to use_gpu=False makes the training to continue, but only on the CPU.

Tune Status Resources requested: 2.0/2 CPUs, 0/1 GPUs, 0.0/10.85 GiB heap, 0.0/0.19 GiB objects

DenysAshikhin commented 1 year ago

Got the same problem when trying to resume an experiment on Colab with exactly the same configuration as before.

tuner = Tuner.restore("saved_models", trainer, resume_unfinished = True, resume_errored=True)
tuner.fit()

at the end of iteration. Single GPU setup. Python 3.9.16, torch 2.0.0+cu118, ray 2.3.1

trainer = RLTrainer(
    run_config=run_config,
    scaling_config=ScalingConfig(
        num_workers=2, use_gpu=True,
        trainer_resources={"CPU": 0.0}, 
        resources_per_worker={"CPU": 1.0}),
    algorithm="PPO",
    config=config_cf,
    resume_from_checkpoint=rl_checkpoint,
)

Tune Status Resources requested: 2.0/2 CPUs, 1.0/1 GPUs, 0.0/10.85 GiB heap, 0.0/0.19 GiB objects

Modifying the use_gpu=True to use_gpu=False makes the training to continue, but only on the CPU.

Tune Status Resources requested: 2.0/2 CPUs, 0/1 GPUs, 0.0/10.85 GiB heap, 0.0/0.19 GiB objects

I'm happy to know there's kinda a work-around for some. However, it still didn't work for me and it doesn't fix that I need to train on my GPU (as I'm sure others do as well)

kouroshHakha commented 1 year ago

Apparently the torch optimizer param_group values should not be moved to cuda devices when restoring optimizer states. There will be a PR addressing this issue in the next releases.

DenysAshikhin commented 1 year ago

@kouroshHakha Thanks for the update, is there a link to view the pr so I can incorporate it on my end until it's through officially?

kouroshHakha commented 1 year ago

Hey @DenysAshikhin, so here is the core change that need to happen in RLlib's torch policy. If it's urgent, you can make these changes on your local installation of ray. If it can wait a few days, you can either install nightly or use master once this PR is merged. If you need reliability, you have to wait for the released version.

DenysAshikhin commented 1 year ago

@kouroshHakha Again thank you for the prompt response. Unfortunately, I have manually added a different pr fixing a memory leak in policyserverinput which will not be included anytime soon I'd wager (source: https://github.com/ray-project/ray/pull/31400).

As such, I will need to manually add this as I don't want to redo the other for now. I'll report back to confirm if the linked pr fixes it for me soon.

kouroshHakha commented 1 year ago

@DenysAshikhin I have pinged the owner of the other PR for possibility of merge . Thanks for your valuable inputs on our library.