Closed DenysAshikhin closed 1 year ago
Can you share your script?
@xwjiang2010 I think I'm facing same issue, after PBT/PB2 perturbs one of my trials and loads them from checkpoint (loads successfully) next opt.step()
seems to be failing. I am trying to debug this - if you have any clues I'm all ears.
OS: Linux 6.1 Python: 3.10 Ray: both on 2.3.1 and latest nightly (1b5b2f8c61)
code that reproduces the issue every time I run it on my setup
import ray
from ray.rllib.algorithms.ppo import PPOConfig
from ray import tune
from ray.tune.schedulers.pb2 import PB2
from ray import air
ray.init(address="auto")
config = (
PPOConfig()
.framework("torch")
.environment("BipedalWalker-v3")
.training(
lr=1e-5,
model={"fcnet_hiddens": [128, 128]},
train_batch_size=1024,
)
.rollouts(num_rollout_workers=5, num_envs_per_worker=4)
.resources(num_gpus=1.0 / 4)
)
perturbation_interval = 20
pb2 = PB2(
time_attr="training_iteration",
perturbation_interval=perturbation_interval,
hyperparam_bounds={"lr": [1e-3, 1e-7], "train_batch_size": [128, 1024 * 8]},
)
param_space = {**config.to_dict(), **{"checkpoint_interval": perturbation_interval}}
tuner = tune.Tuner(
"PPO",
param_space=param_space,
run_config=air.RunConfig(
stop={"training_iteration": 1e9},
verbose=1,
),
tune_config=tune.TuneConfig(
scheduler=pb2, metric="episode_reward_mean", mode="max", num_samples=4
),
)
results = tuner.fit()
piece of logs:
(RolloutWorker pid=189049) 2023-04-08 20:06:19,348 WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(PPO pid=188910) 2023-04-08 20:06:19,348 WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(PPO pid=188910) 2023-04-08 20:06:19,348 WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset.
(RolloutWorker pid=189049) pybullet build time: Apr 4 2023 02:40:04 [repeated 2x across cluster]
== Status ==
Current time: 2023-04-08 20:06:20 (running for 00:03:11.58)
PopulationBasedTraining: 2 checkpoints, 1 perturbs
Logical resource usage: 24.0/24 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G)
Current best trial: 9e53a_00003 with episode_reward_mean=-809.2682569878904 and parameters={'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0.25, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, '_fake_gpus': False, 'num_learner_workers': 0, 'num_gpus_per_learner_worker': 0, 'num_cpus_per_learner_worker': 1, 'local_gpu_idx': 0, 'custom_resources_per_worker': {}, 'placement_strategy': 'PACK', 'eager_tracing': False, 'eager_max_retraces': 20, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'env': 'quadcopter-env-v0', 'env_config': {}, 'observation_space': None, 'action_space': None, 'env_task_fn': None, 'render_env': False, 'clip_rewards': None, 'normalize_actions': True, 'clip_actions': False, 'disable_env_checking': False, 'is_atari': False, 'auto_wrap_old_gym_envs': True, 'num_envs_per_worker': 4, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'sample_async': False, 'enable_connectors': True, 'rollout_fragment_length': 'auto', 'batch_mode': 'truncate_episodes', 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'validate_workers_after_construction': True, 'preprocessor_pref': 'deepmind', 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'compress_observations': False, 'enable_tf1_exec_eagerly': False, 'sampler_perf_stats_ema_coef': None, 'gamma': 0.99, 'lr': 1e-05, 'train_batch_size': 1024, 'model': {'_disable_preprocessor_api': False, '_disable_action_flattening': False, 'fcnet_hiddens': [128, 128], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'encoder_latent_dim': None, 'lstm_use_prev_action_reward': -1, '_use_default_native_models': -1}, 'optimizer': {}, 'max_requests_in_flight_per_sampler_worker': 2, 'learner_class': None, '_enable_learner_api': False, '_learner_hps': PPOLearnerHPs(kl_coeff=0.2, kl_target=0.01, use_critic=True, clip_param=0.3, vf_clip_param=10.0, entropy_coeff=0.0, vf_loss_coeff=1.0, lr_schedule=None, entropy_coeff_schedule=None), 'explore': True, 'exploration_config': {'type': 'StochasticSampling'}, 'policy_states_are_swappable': False, 'input_config': {}, 'actions_in_input_normalized': False, 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_config': {}, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'offline_sampling': False, 'evaluation_interval': None, 'evaluation_duration': 10, 'evaluation_duration_unit': 'episodes', 'evaluation_sample_timeout_s': 180.0, 'evaluation_parallel_to_training': False, 'evaluation_config': None, 'off_policy_estimation_methods': {}, 'ope_split_batch_by_episode': True, 'evaluation_num_workers': 0, 'always_attach_evaluation_results': False, 'enable_async_evaluation': False, 'in_evaluation': False, 'sync_filters_on_rollout_workers_timeout_s': 60.0, 'keep_per_episode_custom_metrics': False, 'metrics_episode_collection_timeout_s': 60.0, 'metrics_num_episodes_for_smoothing': 100, 'min_time_s_per_iteration': None, 'min_train_timesteps_per_iteration': 0, 'min_sample_timesteps_per_iteration': 0, 'export_native_model_files': False, 'checkpoint_trainable_policies_only': False, 'logger_creator': None, 'logger_config': None, 'log_level': 'WARN', 'log_sys_usage': True, 'fake_sampler': False, 'seed': None, 'worker_cls': None, 'ignore_worker_failures': False, 'recreate_failed_workers': False, 'max_num_worker_restarts': 1000, 'delay_between_worker_restarts_s': 60.0, 'restart_failed_sub_environments': False, 'num_consecutive_worker_failures_tolerance': 100, 'worker_health_probe_timeout_s': 60, 'worker_restore_timeout_s': 1800, 'rl_module_spec': None, '_enable_rl_module_api': False, '_validate_exploration_conf_and_rl_modules': True, '_tf_policy_handles_more_than_one_loss': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, '_disable_execution_plan_api': True, 'simple_optimizer': False, 'replay_sequence_length': None, 'horizon': -1, 'soft_horizon': -1, 'no_done_at_end': -1, 'lr_schedule': None, 'use_critic': True, 'use_gae': True, 'kl_coeff': 0.2, 'sgd_minibatch_size': 128, 'num_sgd_iter': 30, 'shuffle_sequences': True, 'vf_loss_coeff': 1.0, 'entropy_coeff': 0.0, 'entropy_coeff_schedule': None, 'clip_param': 0.3, 'vf_clip_param': 10.0, 'grad_clip': None, 'kl_target': 0.01, 'vf_share_layers': -1, 'checkpoint_interval': 20, '__stdout_file__': None, '__stderr_file__': None, 'lambda': 1.0, 'input': 'sampler', 'multiagent': {'policies': {'default_policy': (None, None, None, None)}, 'policy_mapping_fn': <function AlgorithmConfig.DEFAULT_POLICY_MAPPING_FN at 0x7f4db05ea830>, 'policies_to_train': None, 'policy_map_capacity': 100, 'policy_map_cache': -1, 'count_steps_by': 'env_steps', 'observation_fn': None}, 'callbacks': <class 'ray.rllib.algorithms.callbacks.DefaultCallbacks'>, 'create_env_on_driver': False, 'custom_eval_function': None, 'framework': 'torch', 'num_cpus_for_driver': 1, 'num_workers': 5}
Result logdir: /home/pp/ray_results/PPO
Number of trials: 4/4 (4 RUNNING)
(PPO pid=188910) 2023-04-08 20:06:20,350 WARNING checkpoints.py:109 -- No `rllib_checkpoint.json` file found in checkpoint directory /tmp/checkpoint_tmp_a68e78c04f114e399acd38a05eff6631/.! Trying to extract checkpoint info from other files found in that dir.
(PPO pid=188910) 2023-04-08 20:06:20,391 INFO trainable.py:915 -- Restored on 192.168.178.20 from checkpoint: /tmp/checkpoint_tmp_a68e78c04f114e399acd38a05eff6631
(PPO pid=188910) 2023-04-08 20:06:20,392 INFO trainable.py:924 -- Current state after restoring: {'_iteration': 80, '_timesteps_total': None, '_time_total': 146.15634059906006, '_episodes_total': 155}
2023-04-08 20:06:20,958 ERROR trial_runner.py:1485 -- Trial PPO_quadcopter-env-v0_9e53a_00003: Error happened when processing _ExecutorEventType.TRAINING_RESULT.
ray.exceptions.RayTaskError(RuntimeError): ray::PPO.train() (pid=188912, ip=192.168.178.20, repr=PPO)
File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 386, in train
raise skipped from exception_cause(skipped)
File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 383, in train
result = self.step()
File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 792, in step
results, train_iter_ctx = self._run_one_training_iteration()
File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2811, in _run_one_training_iteration
results = self.training_step()
File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 432, in training_step
train_results = multi_gpu_train_one_step(self, train_batch)
File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/execution/train_ops.py", line 163, in multi_gpu_train_one_step
results = policy.learn_on_loaded_batch(
File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 825, in learn_on_loaded_batch
self.apply_gradients(_directStepOptimizerSingleton)
File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 885, in apply_gradients
opt.step()
File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper
out = func(*args, **kwargs)
File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 33, in _use_grad
ret = func(self, *args, **kwargs)
File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 141, in step
adam(
File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 281, in adam
func(params,
File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 449, in _multi_tensor_adam
torch._foreach_addcmul_(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2)
RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.
@xwjiang2010 @perduta
I think I have a lead on this, I'll do some more testing but I think it has to do with the num_gpu and roll_out_workers and how they are set. When I initially ran the tuner.run I hadn't set the gpu resources correct so it trained on the cpu instead. Then later I fixed it and went back to load an older checkpoint (that had the improper gpu set) and it tried loading it onto the gpu instead of cpu causing that issue. I'll see if I can recreate it more consistently.
Actually are people all running into this issue with RLlib Algorithms? It may have something to do with how Algorithm save and load checkpoints (basically they should be consistent - whether both on gpu or both on cpu).
cc @kouroshHakha
related: https://discuss.ray.io/t/runtimeerror-expected-scalars-to-be-on-cpu-got-cuda-0-instead/9998
Actually are people all running into this issue with RLlib Algorithms? It may have something to do with how Algorithm save and load checkpoints (basically they should be consistent - whether both on gpu or both on cpu).
cc @kouroshHakha
I've reproduced this with both PPO and SAC, didn't check the rest.
I also met the same issue when restoring and training a PPO agent with ray 2.3.1. Is there any temporary solution? I used run_experiment() to train.
Hey @DenysAshikhin, I don't have much visibility into this issue, but if you can create a minimal repro script that we can use to debug it would be a great starting point.
For examples some script that trains a PPO agent on cuda for one iteration, and then tries to restore it (for inference or continuation of training) but it fails with the error message you showed.
Thanks.
also tagging @perduta @WeihaoTan to provide a repro script. Thanks!
Hi @kouroshHakha @xwjiang2010 Here is the repro script. If you run it using a machine with 1 GPU. It works perfectly. If you run it using a machine with multiple GPUs. The bug will appear. train.py
import argparse
import yaml
import ray
from ray.tune.experiment.config_parser import _make_parser
from ray.tune.progress_reporter import CLIReporter
from ray.tune.tune import run_experiments
from ray.tune.registry import register_trainable, register_env
from ray.tune.schedulers import create_scheduler
from ray.rllib.models import ModelCatalog
from ray.rllib.utils.framework import try_import_torch
from algorithms.registry import ALGORITHMS, get_algorithm_class
from envs.registry import ENVIRONMENTS, get_env_class, POLICY_MAPPINGS, CALLBACKS
from models.registry import MODELS, get_model_class, ACTION_DISTS, get_action_dist_class
EXAMPLE_USAGE = """
python train.py -f config.yaml
"""
# Try to import both backends for flag checking/warnings.
torch, _ = try_import_torch()
def create_parser(parser_creator=None):
parser = _make_parser(
parser_creator=parser_creator,
formatter_class=argparse.RawDescriptionHelpFormatter,
description="Train a reinforcement learning agent.",
epilog=EXAMPLE_USAGE,
)
# See also the base parser definition in ray/tune/experiment/__config_parser.py
parser.add_argument(
"--ray-address",
default=None,
type=str,
help="Connect to an existing Ray cluster at this address instead "
"of starting a new one.",
)
parser.add_argument(
"--ray-ui", action="store_true", help="Whether to enable the Ray web UI."
)
parser.add_argument(
"--local-mode",
action="store_true",
help="Run ray in local mode for easier debugging.",
)
parser.add_argument(
"--ray-num-cpus",
default=None,
type=int,
help="--num-cpus to use if starting a new cluster.",
)
parser.add_argument(
"--ray-num-gpus",
default=None,
type=int,
help="--num-gpus to use if starting a new cluster.",
)
parser.add_argument(
"--ray-num-nodes",
default=None,
type=int,
help="Emulate multiple cluster nodes for debugging.",
)
parser.add_argument(
"--ray-object-store-memory",
default=None,
type=int,
help="--object-store-memory to use if starting a new cluster.",
)
parser.add_argument(
"--resume",
action="store_true",
help="Whether to attempt to resume previous Tune experiments.",
)
parser.add_argument(
"-f",
"--config-file",
default="config.yaml",
type=str,
help="If specified, use config options from this file. Note that this "
"overrides any trial-specific options set via flags above.",
)
return parser
def run(args, parser):
assert args.config_file is not None, "Must specify a config file"
with open(args.config_file) as f:
experiments = yaml.safe_load(f)
verbose = 1
for exp in experiments.values():
metric_columns = exp.pop("metric_columns", None)
if not exp.get("run"):
parser.error("the following arguments are required: --run")
if not exp.get("env") and not exp.get("config", {}).get("env"):
parser.error("the following arguments are required: --env")
if exp["config"].get("multiagent"):
policy_mapping_name = exp["config"]["multiagent"].get("policy_mapping_fn")
if isinstance(policy_mapping_name, str):
exp["config"]["multiagent"]["policy_mapping_fn"] = POLICY_MAPPINGS[policy_mapping_name]
if exp["config"].get("callbacks"):
calback_name = exp["config"].get("callbacks")
if isinstance(calback_name, str):
exp["config"]["callbacks"] = CALLBACKS[calback_name]
if args.ray_num_nodes:
from ray.cluster_utils import Cluster
cluster = Cluster()
for _ in range(args.ray_num_nodes):
cluster.add_node(
num_cpus=args.ray_num_cpus or 1,
num_gpus=args.ray_num_gpus or 0,
object_store_memory=args.ray_object_store_memory,
)
ray.init(address=cluster.address)
else:
ray.init(
include_dashboard=args.ray_ui,
address=args.ray_address,
object_store_memory=args.ray_object_store_memory,
num_cpus=args.ray_num_cpus,
num_gpus=args.ray_num_gpus,
local_mode=args.local_mode,
)
progress_reporter = CLIReporter(
print_intermediate_tables=verbose >= 1,
metric_columns=metric_columns,
)
trials = run_experiments(
experiments,
scheduler=create_scheduler(args.scheduler, **args.scheduler_config),
resume=args.resume,
verbose=verbose,
progress_reporter=progress_reporter,
concurrent=True,
)
ray.shutdown()
checkpoints = []
for trial in trials:
if trial.checkpoint.dir_or_data:
checkpoints.append(trial.checkpoint.dir_or_data)
if checkpoints:
from rich import print
from rich.panel import Panel
print("\nYour training finished.")
print("Best available checkpoint for each trial:")
for cp in checkpoints:
print(f" {cp}")
print(
"\nYou can now evaluate your trained algorithm from any "
"checkpoint, e.g. by running:"
)
print(Panel(f"[green] rllib evaluate {checkpoints[0]} "))
def main():
parser = create_parser()
args = parser.parse_args()
run(args, parser)
if __name__ == "__main__":
main()
config.yaml
test:
run: PPO
checkpoint_config:
checkpoint_frequency: 50
checkpoint_at_end: true
num_to_keep: 5
local_dir: ray_results
stop:
timesteps_total: 1000
#restore: checkpoint
config:
framework: torch
env: CartPole-v1
num_workers: 4
num_cpus_for_driver: 1
num_envs_per_worker: 1
num_cpus_per_worker: 2
num_gpus: 1
disable_env_checking: true
After training is finished, uncomment "restore" and put the trained checkpoint path in it. Increase timesteps_total and re-run train.py, the bug will appear. Something like:
torch._foreach_addcmul_(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2)
RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.
@xwjiang2010 I think I'm facing same issue, after PBT/PB2 perturbs one of my trials and loads them from checkpoint (loads successfully) next
opt.step()
seems to be failing. I am trying to debug this - if you have any clues I'm all ears.OS: Linux 6.1 Python: 3.10 Ray: both on 2.3.1 and latest nightly (1b5b2f8)
code that reproduces the issue every time I run it on my setup
import ray from ray.rllib.algorithms.ppo import PPOConfig from ray import tune from ray.tune.schedulers.pb2 import PB2 from ray import air ray.init(address="auto") config = ( PPOConfig() .framework("torch") .environment("BipedalWalker-v3") .training( lr=1e-5, model={"fcnet_hiddens": [128, 128]}, train_batch_size=1024, ) .rollouts(num_rollout_workers=5, num_envs_per_worker=4) .resources(num_gpus=1.0 / 4) ) perturbation_interval = 20 pb2 = PB2( time_attr="training_iteration", perturbation_interval=perturbation_interval, hyperparam_bounds={"lr": [1e-3, 1e-7], "train_batch_size": [128, 1024 * 8]}, ) param_space = {**config.to_dict(), **{"checkpoint_interval": perturbation_interval}} tuner = tune.Tuner( "PPO", param_space=param_space, run_config=air.RunConfig( stop={"training_iteration": 1e9}, verbose=1, ), tune_config=tune.TuneConfig( scheduler=pb2, metric="episode_reward_mean", mode="max", num_samples=4 ), ) results = tuner.fit()
piece of logs:
(RolloutWorker pid=189049) 2023-04-08 20:06:19,348 WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset. (PPO pid=188910) 2023-04-08 20:06:19,348 WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset. (PPO pid=188910) 2023-04-08 20:06:19,348 WARNING env.py:155 -- Your env doesn't have a .spec.max_episode_steps attribute. Your horizon will default to infinity, and your environment will not be reset. (RolloutWorker pid=189049) pybullet build time: Apr 4 2023 02:40:04 [repeated 2x across cluster] == Status == Current time: 2023-04-08 20:06:20 (running for 00:03:11.58) PopulationBasedTraining: 2 checkpoints, 1 perturbs Logical resource usage: 24.0/24 CPUs, 1.0/1 GPUs (0.0/1.0 accelerator_type:G) Current best trial: 9e53a_00003 with episode_reward_mean=-809.2682569878904 and parameters={'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0.25, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, '_fake_gpus': False, 'num_learner_workers': 0, 'num_gpus_per_learner_worker': 0, 'num_cpus_per_learner_worker': 1, 'local_gpu_idx': 0, 'custom_resources_per_worker': {}, 'placement_strategy': 'PACK', 'eager_tracing': False, 'eager_max_retraces': 20, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'env': 'quadcopter-env-v0', 'env_config': {}, 'observation_space': None, 'action_space': None, 'env_task_fn': None, 'render_env': False, 'clip_rewards': None, 'normalize_actions': True, 'clip_actions': False, 'disable_env_checking': False, 'is_atari': False, 'auto_wrap_old_gym_envs': True, 'num_envs_per_worker': 4, 'sample_collector': <class 'ray.rllib.evaluation.collectors.simple_list_collector.SimpleListCollector'>, 'sample_async': False, 'enable_connectors': True, 'rollout_fragment_length': 'auto', 'batch_mode': 'truncate_episodes', 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'validate_workers_after_construction': True, 'preprocessor_pref': 'deepmind', 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'compress_observations': False, 'enable_tf1_exec_eagerly': False, 'sampler_perf_stats_ema_coef': None, 'gamma': 0.99, 'lr': 1e-05, 'train_batch_size': 1024, 'model': {'_disable_preprocessor_api': False, '_disable_action_flattening': False, 'fcnet_hiddens': [128, 128], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': False, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'encoder_latent_dim': None, 'lstm_use_prev_action_reward': -1, '_use_default_native_models': -1}, 'optimizer': {}, 'max_requests_in_flight_per_sampler_worker': 2, 'learner_class': None, '_enable_learner_api': False, '_learner_hps': PPOLearnerHPs(kl_coeff=0.2, kl_target=0.01, use_critic=True, clip_param=0.3, vf_clip_param=10.0, entropy_coeff=0.0, vf_loss_coeff=1.0, lr_schedule=None, entropy_coeff_schedule=None), 'explore': True, 'exploration_config': {'type': 'StochasticSampling'}, 'policy_states_are_swappable': False, 'input_config': {}, 'actions_in_input_normalized': False, 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_config': {}, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'offline_sampling': False, 'evaluation_interval': None, 'evaluation_duration': 10, 'evaluation_duration_unit': 'episodes', 'evaluation_sample_timeout_s': 180.0, 'evaluation_parallel_to_training': False, 'evaluation_config': None, 'off_policy_estimation_methods': {}, 'ope_split_batch_by_episode': True, 'evaluation_num_workers': 0, 'always_attach_evaluation_results': False, 'enable_async_evaluation': False, 'in_evaluation': False, 'sync_filters_on_rollout_workers_timeout_s': 60.0, 'keep_per_episode_custom_metrics': False, 'metrics_episode_collection_timeout_s': 60.0, 'metrics_num_episodes_for_smoothing': 100, 'min_time_s_per_iteration': None, 'min_train_timesteps_per_iteration': 0, 'min_sample_timesteps_per_iteration': 0, 'export_native_model_files': False, 'checkpoint_trainable_policies_only': False, 'logger_creator': None, 'logger_config': None, 'log_level': 'WARN', 'log_sys_usage': True, 'fake_sampler': False, 'seed': None, 'worker_cls': None, 'ignore_worker_failures': False, 'recreate_failed_workers': False, 'max_num_worker_restarts': 1000, 'delay_between_worker_restarts_s': 60.0, 'restart_failed_sub_environments': False, 'num_consecutive_worker_failures_tolerance': 100, 'worker_health_probe_timeout_s': 60, 'worker_restore_timeout_s': 1800, 'rl_module_spec': None, '_enable_rl_module_api': False, '_validate_exploration_conf_and_rl_modules': True, '_tf_policy_handles_more_than_one_loss': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, '_disable_execution_plan_api': True, 'simple_optimizer': False, 'replay_sequence_length': None, 'horizon': -1, 'soft_horizon': -1, 'no_done_at_end': -1, 'lr_schedule': None, 'use_critic': True, 'use_gae': True, 'kl_coeff': 0.2, 'sgd_minibatch_size': 128, 'num_sgd_iter': 30, 'shuffle_sequences': True, 'vf_loss_coeff': 1.0, 'entropy_coeff': 0.0, 'entropy_coeff_schedule': None, 'clip_param': 0.3, 'vf_clip_param': 10.0, 'grad_clip': None, 'kl_target': 0.01, 'vf_share_layers': -1, 'checkpoint_interval': 20, '__stdout_file__': None, '__stderr_file__': None, 'lambda': 1.0, 'input': 'sampler', 'multiagent': {'policies': {'default_policy': (None, None, None, None)}, 'policy_mapping_fn': <function AlgorithmConfig.DEFAULT_POLICY_MAPPING_FN at 0x7f4db05ea830>, 'policies_to_train': None, 'policy_map_capacity': 100, 'policy_map_cache': -1, 'count_steps_by': 'env_steps', 'observation_fn': None}, 'callbacks': <class 'ray.rllib.algorithms.callbacks.DefaultCallbacks'>, 'create_env_on_driver': False, 'custom_eval_function': None, 'framework': 'torch', 'num_cpus_for_driver': 1, 'num_workers': 5} Result logdir: /home/pp/ray_results/PPO Number of trials: 4/4 (4 RUNNING) (PPO pid=188910) 2023-04-08 20:06:20,350 WARNING checkpoints.py:109 -- No `rllib_checkpoint.json` file found in checkpoint directory /tmp/checkpoint_tmp_a68e78c04f114e399acd38a05eff6631/.! Trying to extract checkpoint info from other files found in that dir. (PPO pid=188910) 2023-04-08 20:06:20,391 INFO trainable.py:915 -- Restored on 192.168.178.20 from checkpoint: /tmp/checkpoint_tmp_a68e78c04f114e399acd38a05eff6631 (PPO pid=188910) 2023-04-08 20:06:20,392 INFO trainable.py:924 -- Current state after restoring: {'_iteration': 80, '_timesteps_total': None, '_time_total': 146.15634059906006, '_episodes_total': 155} 2023-04-08 20:06:20,958 ERROR trial_runner.py:1485 -- Trial PPO_quadcopter-env-v0_9e53a_00003: Error happened when processing _ExecutorEventType.TRAINING_RESULT. ray.exceptions.RayTaskError(RuntimeError): ray::PPO.train() (pid=188912, ip=192.168.178.20, repr=PPO) File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 386, in train raise skipped from exception_cause(skipped) File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 383, in train result = self.step() File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 792, in step results, train_iter_ctx = self._run_one_training_iteration() File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 2811, in _run_one_training_iteration results = self.training_step() File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/algorithms/ppo/ppo.py", line 432, in training_step train_results = multi_gpu_train_one_step(self, train_batch) File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/execution/train_ops.py", line 163, in multi_gpu_train_one_step results = policy.learn_on_loaded_batch( File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 825, in learn_on_loaded_batch self.apply_gradients(_directStepOptimizerSingleton) File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/ray/rllib/policy/torch_policy_v2.py", line 885, in apply_gradients opt.step() File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper out = func(*args, **kwargs) File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 33, in _use_grad ret = func(self, *args, **kwargs) File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 141, in step adam( File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 281, in adam func(params, File "/run/media/pp/5dd643a3-0870-41f2-96e4-7713f19f00de/code/python/quadcopter/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 449, in _multi_tensor_adam torch._foreach_addcmul_(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2) RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.
@xwjiang2010 is this enough? if not, what else should I attach?
Should we attach bug
label back? I don't know any way to workaround this and I am unable to train any RL model at this time using ray.
For my side (doesn't even require multiple gpus):
import ray
from ray.rllib.env import PolicyServerInput
from ray.rllib.algorithms.ppo import PPOConfig
import numpy as np
import argparse
from gymnasium.spaces import MultiDiscrete, Box
ray.init(num_cpus=9, num_gpus=1, log_to_driver=False, configure_logging=False)
ppo_config = PPOConfig()
parser = argparse.ArgumentParser(description='Optional app description')
parser.add_argument('-ip', type=str, help='IP of this device')
parser.add_argument('-checkpoint', type=str, help='location of checkpoint to restore from')
args = parser.parse_args()
def _input(ioctx):
# We are remote worker, or we are local worker with num_workers=0:
# Create a PolicyServerInput.
if ioctx.worker_index > 0 or ioctx.worker.num_workers == 0:
return PolicyServerInput(
ioctx,
args.ip,
55556 + ioctx.worker_index - (1 if ioctx.worker_index > 0 else 0),
)
# No InputReader (PolicyServerInput) needed.
else:
return None
x = 320
y = 240
# kl_coeff, ->default 0.2
# ppo_config.gamma = 0.01 # vf_loss_coeff used to be 0.01??
# "entropy_coeff": 0.00005,
# "clip_param": 0.1,
ppo_config.gamma = 0.998 # default 0.99
ppo_config.lambda_ = 0.99 # default 1.0???
ppo_config.kl_target = 0.01 # default 0.01
ppo_config.rollout_fragment_length = 128
# ppo_config.train_batch_size = 8500
# ppo_config.train_batch_size = 10000
ppo_config.train_batch_size = 12000
ppo_config.sgd_minibatch_size = 512
# ppo_config.num_sgd_iter = 2 # default 30???
ppo_config.num_sgd_iter = 7 # default 30???
# ppo_config.lr = 3.5e-5 # 5e-5
ppo_config.lr = 9e-5 # 5e-5
ppo_config.model = {
# Share layers for value function. If you set this to True, it's
# important to tune vf_loss_coeff.
"vf_share_layers": True,
"use_lstm": True,
"max_seq_len": 32,
"lstm_cell_size": 128,
"lstm_use_prev_action": True,
"conv_filters": [
# 240 X 320
[16, [5, 5], 3],
[32, [5, 5], 3],
[64, [5, 5], 3],
[128, [3, 3], 2],
[256, [3, 3], 2],
[512, [3, 3], 2],
],
"conv_activation": "relu",
"post_fcnet_hiddens": [512],
"post_fcnet_activation": "relu"
}
ppo_config.batch_mode = "complete_episodes"
ppo_config.simple_optimizer = True
# ppo_config["remote_worker_envs"] = True
ppo_config.env = None
ppo_config.observation_space = Box(low=0, high=1, shape=(y, x, 1), dtype=np.float32)
ppo_config.action_space = MultiDiscrete(
[
2, # W
2, # A
2, # S
2, # D
2, # Space
2, # H
2, # J
2, # K
2 # L
]
)
ppo_config.env_config = {
"sleep": True,
'replayOn': False
}
ppo_config.rollouts(num_rollout_workers=2, enable_connectors=False)
ppo_config.offline_data(input_=_input)
ppo_config.framework_str = 'torch'
ppo_config.log_sys_usage = False
ppo_config.compress_observations = True
ppo_config.shuffle_sequences = False
ppo_config.num_gpus = 0.35
ppo_config.num_gpus_per_worker = 0.1
ppo_config.num_cpus_per_worker = 2
ppo_config.num_cpus_per_learner_worker = 2
ppo_config.num_gpus_per_learner_worker = 0.35
tempyy = ppo_config.to_dict()
print(tempyy)
from ray import tune
name = "" + args.checkpoint
print(f"Starting: {name}")
tune.run("PPO",
resume='AUTO',
# param_space=config,
config=tempyy,
name=name, keep_checkpoints_num=None, checkpoint_score_attr="episode_reward_mean",
max_failures=1,
checkpoint_freq=5, checkpoint_at_end=True)
You can of course subsitute the env with a random env. In my case, after letting it make some checkpoints, and then ctr+c in the cmd window. After it gracefully saves, running it again gives the same error as outlined above
My issue is occurring on a single GPU machine as well.
On Thu, Apr 13, 2023, 14:35 Denys Ashikhin @.***> wrote:
For my side (doesn't even require multiple gpus):
import ray from ray.rllib.env import PolicyServerInput from ray.rllib.algorithms.ppo import PPOConfig
import numpy as np import argparse from gymnasium.spaces import MultiDiscrete, Box
ray.init(num_cpus=9, num_gpus=1, log_to_driver=False, configure_logging=False)
ppo_config = PPOConfig()
parser = argparse.ArgumentParser(description='Optional app description') parser.add_argument('-ip', type=str, help='IP of this device')
parser.add_argument('-checkpoint', type=str, help='location of checkpoint to restore from')
args = parser.parse_args()
def _input(ioctx):
We are remote worker, or we are local worker with num_workers=0:
# Create a PolicyServerInput. if ioctx.worker_index > 0 or ioctx.worker.num_workers == 0: return PolicyServerInput( ioctx, args.ip, 55556 + ioctx.worker_index - (1 if ioctx.worker_index > 0 else 0), ) # No InputReader (PolicyServerInput) needed. else: return None
x = 320 y = 240
kl_coeff, ->default 0.2
ppo_config.gamma = 0.01 # vf_loss_coeff used to be 0.01??
"entropy_coeff": 0.00005,
"clip_param": 0.1,
ppo_config.gamma = 0.998 # default 0.99 ppoconfig.lambda = 0.99 # default 1.0??? ppo_config.kl_target = 0.01 # default 0.01 ppo_config.rollout_fragment_length = 128
ppo_config.train_batch_size = 8500
ppo_config.train_batch_size = 10000
ppo_config.train_batch_size = 12000 ppo_config.sgd_minibatch_size = 512
ppo_config.num_sgd_iter = 2 # default 30???
ppo_config.num_sgd_iter = 7 # default 30???
ppo_config.lr = 3.5e-5 # 5e-5ppo_config.lr = 9e-5 # 5e-5
ppo_config.model = {
Share layers for value function. If you set this to True, it's
# important to tune vf_loss_coeff. "vf_share_layers": True, "use_lstm": True, "max_seq_len": 32, "lstm_cell_size": 128, "lstm_use_prev_action": True, "conv_filters": [ # 240 X 320 [16, [5, 5], 3], [32, [5, 5], 3], [64, [5, 5], 3], [128, [3, 3], 2], [256, [3, 3], 2], [512, [3, 3], 2], ], "conv_activation": "relu", "post_fcnet_hiddens": [512], "post_fcnet_activation": "relu"
} ppo_config.batch_mode = "complete_episodes" ppo_config.simple_optimizer = True
ppo_config["remote_worker_envs"] = True
ppo_config.env = None ppo_config.observation_space = Box(low=0, high=1, shape=(y, x, 1), dtype=np.float32) ppo_config.action_space = MultiDiscrete( [ 2, # W 2, # A 2, # S 2, # D 2, # Space 2, # H 2, # J 2, # K 2 # L ] ) ppo_config.env_config = { "sleep": True, 'replayOn': False }
ppo_config.rollouts(num_rollout_workers=2, enable_connectors=False) ppo_config.offlinedata(input=_input)
ppo_config.framework_str = 'torch' ppo_config.log_sys_usage = False ppo_config.compress_observations = True ppo_config.shuffle_sequences = False ppo_config.num_gpus = 0.5 ppo_config.num_gpus_per_worker = 0.25 ppo_config.num_cpus_per_worker = 2 ppo_config.num_cpus_per_learner_worker = 2 ppo_config.num_gpus_per_learner_worker = 0.5
tempyy = ppo_config.to_dict()
print(tempyy)
from ray import tune
name = "" + args.checkpoint print(f"Starting: {name}")
tune.run("PPO", resume='AUTO',
param_space=config,
config=tempyy, name=name, keep_checkpoints_num=None, checkpoint_score_attr="episode_reward_mean", max_failures=1, checkpoint_freq=5, checkpoint_at_end=True) ``` You can of course subsitute the env with a random env. In my case, after letting it make some checkpoints, and then ctr+c in the cmd window. After it gracefully saves, running it again gives the same error as outlined above
— Reply to this email directly, view it on GitHub https://github.com/ray-project/ray/issues/34159#issuecomment-1506888297, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEMC3OYRFMYMNWOVS32N5FTXA7XKBANCNFSM6AAAAAAWWUYRZE . You are receiving this because you were mentioned.Message ID: @.***>
Got the same problem when trying to resume an experiment on Colab with exactly the same configuration as before.
tuner = Tuner.restore("saved_models", trainer, resume_unfinished = True, resume_errored=True)
tuner.fit()
at the end of iteration.
Single GPU setup.
Python 3.9.16
, torch 2.0.0+cu118
, ray 2.3.1
@cheadrian what is the trainer
in your case?
trainer = RLTrainer(
run_config=run_config,
scaling_config=ScalingConfig(
num_workers=2, use_gpu=True,
trainer_resources={"CPU": 0.0},
resources_per_worker={"CPU": 1.0}),
algorithm="PPO",
config=config_cf,
resume_from_checkpoint=rl_checkpoint,
)
I get the same error with or without specifying it.
안녕하세요.
sac을 사용하고 있는데 tune으로 학습한 모델을 Algorithm.from_checkpoint()로 불러와서 쓰려니 저도 똑같은 에러가 나네요 ㅠㅠ
kouroshHakha
Please ignore that point, no matter what settings I use the issue happens same as for the others. I had confused some parameters 😅
same error and no solution yet ???? python3.9/site-packages/torch/optim/adam.py", line 449, in _multi_tensor_adam torch._foreachaddcmul(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2) RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.
@kouroshHakha Could you triage for RL team?
@gunewar I'm sure the team is aware of the issue at this point - let's just give them some time to find a way to reproduce this on their end at which point it should be a simple fix (hopefully).
I get the feeling that it's some new configuration that got missed in automated builds
안녕하세요.
sac을 사용하고 있는데 tune으로 학습한 모델을 Algorithm.from_checkpoint()로 불러와서 쓰려니 저도 똑같은 에러가 나네요 ㅠㅠ
config에서 num gpus를 제거하니 일단 학습은 됩니다! :)
I remove numgpus from config, so it's learning for now! :)
안녕하세요. sac을 사용하고 있는데 tune으로 학습한 모델을 Algorithm.from_checkpoint()로 불러와서 쓰려니 저도 똑같은 에러가 나네요 ㅠㅠ
config에서 num gpus를 제거하니 일단 학습은 됩니다! :)
I remove numgpus from config, so it's learning for now! :)
So you trained initially with num_gpus (thus training on the gpu) and for subsequent runs you turned off the num_gpu? Is it training on your cpu or gpu then?
안녕하세요. sac을 사용하고 있는데 tune으로 학습한 모델을 Algorithm.from_checkpoint()로 불러와서 쓰려니 저도 똑같은 에러가 나네요 ㅠㅠ
config에서 num gpus를 제거하니 일단 학습은 됩니다! :) I remove numgpus from config, so it's learning for now! :)
So you trained initially with num_gpus (thus training on the gpu) and for subsequent runs you turned off the num_gpu? Is it training on your cpu or gpu then?
I was able to successfully restore the model that was trained with the num_gpus(=1) config after removing the num_gpus(=0) config. After doing this, there were no errors and it was possible to continue training the model.
Interesting, didn't work for me. Can you confirm if it's training on your GPU still or got switched to your CPU after restoring?
Interesting, didn't work for me. Can you confirm if it's training on your GPU still or got switched to your CPU after restoring?
I just checked the training, and now I'm working on another project.
I used all the configs related to resources as the base and didn't use Ray Tuner
Got the same problem when trying to resume an experiment on Colab with exactly the same configuration as before.
tuner = Tuner.restore("saved_models", trainer, resume_unfinished = True, resume_errored=True) tuner.fit()
at the end of iteration. Single GPU setup.
Python 3.9.16
,torch 2.0.0+cu118
,ray 2.3.1
trainer = RLTrainer( run_config=run_config, scaling_config=ScalingConfig( num_workers=2, use_gpu=True, trainer_resources={"CPU": 0.0}, resources_per_worker={"CPU": 1.0}), algorithm="PPO", config=config_cf, resume_from_checkpoint=rl_checkpoint, )
Tune Status Resources requested: 2.0/2 CPUs, 1.0/1 GPUs, 0.0/10.85 GiB heap, 0.0/0.19 GiB objects
Modifying the use_gpu=True
to use_gpu=False
makes the training to continue, but only on the CPU.
Tune Status Resources requested: 2.0/2 CPUs, 0/1 GPUs, 0.0/10.85 GiB heap, 0.0/0.19 GiB objects
Got the same problem when trying to resume an experiment on Colab with exactly the same configuration as before.
tuner = Tuner.restore("saved_models", trainer, resume_unfinished = True, resume_errored=True) tuner.fit()
at the end of iteration. Single GPU setup.
Python 3.9.16
,torch 2.0.0+cu118
,ray 2.3.1
trainer = RLTrainer( run_config=run_config, scaling_config=ScalingConfig( num_workers=2, use_gpu=True, trainer_resources={"CPU": 0.0}, resources_per_worker={"CPU": 1.0}), algorithm="PPO", config=config_cf, resume_from_checkpoint=rl_checkpoint, )
Tune Status Resources requested: 2.0/2 CPUs, 1.0/1 GPUs, 0.0/10.85 GiB heap, 0.0/0.19 GiB objects
Modifying the
use_gpu=True
touse_gpu=False
makes the training to continue, but only on the CPU.Tune Status Resources requested: 2.0/2 CPUs, 0/1 GPUs, 0.0/10.85 GiB heap, 0.0/0.19 GiB objects
I'm happy to know there's kinda a work-around for some. However, it still didn't work for me and it doesn't fix that I need to train on my GPU (as I'm sure others do as well)
Apparently the torch optimizer param_group values should not be moved to cuda devices when restoring optimizer states. There will be a PR addressing this issue in the next releases.
@kouroshHakha Thanks for the update, is there a link to view the pr so I can incorporate it on my end until it's through officially?
Hey @DenysAshikhin, so here is the core change that need to happen in RLlib's torch policy. If it's urgent, you can make these changes on your local installation of ray. If it can wait a few days, you can either install nightly or use master once this PR is merged. If you need reliability, you have to wait for the released version.
@kouroshHakha Again thank you for the prompt response. Unfortunately, I have manually added a different pr fixing a memory leak in policyserverinput which will not be included anytime soon I'd wager (source: https://github.com/ray-project/ray/pull/31400).
As such, I will need to manually add this as I don't want to redo the other for now. I'll report back to confirm if the linked pr fixes it for me soon.
@DenysAshikhin I have pinged the owner of the other PR for possibility of merge . Thanks for your valuable inputs on our library.
What happened + What you expected to happen
How severe does this issue affect your experience of using Ray?
High: It blocks me to complete my task. Hi all,
I am trying to load in a previously trained model to continue training it, except I get the following error:How severe does this issue affect your experience of using Ray?
High: It blocks me to complete my task. Hi all,
I am trying to load in a previously trained model to continue training it, except I get the following error:
Relevant code:
Versions / Dependencies
OS: Win11 Python: 3.10 Ray: latest nightly windows wheel
Reproduction script
n/a
Issue Severity
High: It blocks me from completing my task.