Open george-skal opened 1 year ago
Thanks for reporting this!
I can reproduce.
HI @ArturNiederfahrenhorst
I tried the method of copying weights that you have on the new self-play examples to see what happens and it works only on CPU. When I use GPU I get the error:
File "/home/george/PycharmProjects/ray_240_venv/venv/lib/python3.10/site-packages/torch/optim/adam.py", line 449, in _multi_tensor_adam torch._foreachaddcmul(device_exp_avg_sqs, device_grads, device_grads, 1 - beta2) RuntimeError: Expected scalars to be on CPU, got cuda:0 instead.
This happens on the 11th iteration that the Callback is called first. This error is mentioned here https://github.com/ray-project/ray/issues/34159 as well, on some other cases.
Please have a look and let me know if there is any work around, because the only way I see is to use ray 1.11 or 1.12, but one of the environments uses only gymnasium that is supported on ray > 2.2, so it is not possible to use older versions. I also tried not using tune and run with .build, but it provided the same error on GPU.
Please find attached the code.
from ray import air, tune
import ray
from ray.tune.registry import register_env
from ray.rllib.algorithms.ppo import PPOConfig
from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from supersuit import pad_observations_v0
from pettingzoo.mpe import simple_tag_v2
from ray.rllib.algorithms.callbacks import DefaultCallbacks
import argparse
import numpy as np
import copy
M = 10 # Menagerie size
class MyCallbacks(DefaultCallbacks):
def __init__(self):
super(MyCallbacks, self).__init__()
self.nan_counter = 0
self.men = []
self.men2 = []
self.men_rewards = []
def on_train_result(self, *, algorithm, result: dict, **kwargs):
print(
"Algorithm.train() result: {} -> {} episodes".format(
algorithm, result["episodes_this_iter"]
)
)
k = result['training_iteration'] # starts from 1
# the "shared_policy_1" is the only agent being trained
if np.isnan(result['episode_reward_mean']):
# global men_start, nan_true
# men_start = i
self.nan_counter += 1 # flag for nana in the beginning
pass
else:
if k <= M + self.nan_counter:
# menagerie initialisation
self.men.append(algorithm.get_policy("shared_policy_1").get_state())
self.men2.append(algorithm.get_policy("shared_policy_2").get_state())
else:
self.men.pop(0)
self.men2.pop(0)
self.men.append(algorithm.get_policy("shared_policy_1").get_state())
self.men2.append(algorithm.get_policy("shared_policy_2").get_state())
sel = list(range(0, M)) # list index in python starts at 0
# print("sel =", sel)
choice = np.random.choice(sel)
# print("choice is ", choice)
algorithm.get_policy("shared_policy_1").set_state(self.men[choice])
choice = np.random.choice(sel)
# print("choice is ", choice)
algorithm.get_policy("shared_policy_2").set_state(self.men2[choice])
algorithm.workers.sync_weights()
result["callback_ok"] = True
if __name__ == "__main__":
for i in range(1, 2):
def env_creator(args):
env = simple_tag_v2.env(num_good=3, num_adversaries=6, num_obstacles=3, max_cycles=25)
env = pad_observations_v0(env)
return env
register_env("simple_tag", lambda args: PettingZooEnv(env_creator(args)))
test_env = PettingZooEnv(env_creator({}))
obs_space = test_env.observation_space
act_spc = test_env.action_space
policies = {"shared_policy_1": (None, obs_space, act_spc, {}),
"shared_policy_2": (None, obs_space, act_spc, {})
# "pursuer_5": (None, obs_space, act_spc, {})
}
policy_ids = list(policies.keys())
def policy_mapping_fn(agent_id, episode, worker, **kwargs):
if agent_id in ["agent_0", "agent_1", "agent_2"]:
# print("agent_id", agent_id)
return "shared_policy_1"
else:
# print("agent_id", agent_id)
return "shared_policy_2"
config = (
PPOConfig()
.environment("simple_tag")
.resources(num_gpus=1)
.rollouts(num_rollout_workers=4) # default = 2 (I should try it)
.callbacks(MyCallbacks)
.framework("torch")
.multi_agent(
policies=policies,
policy_mapping_fn=policy_mapping_fn,
)
)
tune.Tuner(
"PPO",
run_config=air.RunConfig(
name="simple_tag 363 plain self play test trial {0}g".format(i),
stop={"training_iteration": 1500},
checkpoint_config=air.CheckpointConfig(
checkpoint_frequency=10,
),
),
param_space=config.to_dict(),
).fit()
Can also still reproduce this on ray 2.9.3.
File "/home/davidhozic/.local/lib/python3.10/site-packages/ray/rllib/evaluation/postprocessing.py", line 204, in compute_gae_for_sample_batch batch = compute_advantages( File "/home/davidhozic/.local/lib/python3.10/site-packages/ray/rllib/evaluation/postprocessing.py", line 128, in compute_advantages delta_t = rewards + gamma * vpred_t[1:] - vpred_t[:-1] ValueError: operands could not be broadcast together with shapes (101,) (100,)
Running into the same issue too. Ray 2.4.0, A3C and APPO algorithms, no self-play. Interestingly, it only seems to happen if I'm resuming training from a checkpoint, at the end of the first post-restore episode. Does not happen if I run the whole training loop without any restoring from checkpoint, nor do I experience it with the DQN algorithm.
Are there any plans to resolve this issue? For me, this happens when restoring an algorithm from multiple checkpoints. That is, I'm iterating over a checkpoint directory, where for each checkpoint, I call algo.restore()
for the respective checkpoint. It seems I experience this failure after the second call to algo.restore()
...
What happened + What you expected to happen
Hi, I am using a self-play scheme on SImple_tag_v2 of Pettingzoo, that works on a previous installation of ray_300_dev0 and al old ray 1.2.0 (with modification on the code for tune), but has an error on ray 2.3.1 and 2.4 and also if I install again a new ray_300_dev0. It seems there is a problem with newer version of some packages, since it works on the old ray_300_dev0, but I can't find which ones. It doesn't seem to have to do with pettingzoo, since I am using the same versions. The error is:
I use for the weights sharing the method proposed here with deepcopy https://discuss.ray.io/t/policy-weights-overwritten-in-self-play/2520 since there was a bug [that](https://github.com/ray-project/ray/issues/16718) I am not sure if it is fixed. Can it be the problem?
Please also find attached the full error file. error.txt
Thanks, George
Versions / Dependencies
ray 2.4.0 (but also 2.3.0, 2.3.1) torch 2.0.0 pettingzoo 1.22.3 supersuit 3.7.1 python 3.10
Reproduction script
Issue Severity
High: It blocks me from completing my task.