Closed dineshresearch closed 2 years ago
Hi,
Thanks for your interest. Though multi-agent MetaDrive environments are mainly designed for self-interested MARL, it also supports traditional MARL training.
For MADDPG, we already tested it using RLLib implementation. Here we provide a code demo for you:
from metadrive.envs.marl_envs.marl_inout_roundabout import MultiAgentRoundaboutEnv
from ray import tune
from drivingforce.mapgdrive.callbacks import MultiAgentDrivingCallbacks
from ray.rllib.contrib.maddpg.maddpg import MADDPGTrainer
from ray.rllib.env import MultiAgentEnv
from drivingforce.train import train, get_train_parser
from ray.rllib.policy.policy import PolicySpec
class MultiAgentMetaDrive(MultiAgentRoundaboutEnv, MultiAgentEnv):
def step(self, *args, **kwargs):
s, r, d, i = MultiAgentRoundaboutEnv.step(self, *args, **kwargs)
d["__all__"] = all([done for done in d.values()])
return s, r, d, i
def get_obs_action_shape(env_config):
assert env_config is not None
single_env = MultiAgentRoundaboutEnv(env_config)
obs_space = single_env.observation_space
act_space = single_env.action_space
single_env.close()
return obs_space, act_space
if __name__ == "__main__":
args = get_train_parser().parse_args()
exp_name = args.exp_name or "TEST"
stop = int(400_0000)
num_agents = 40
env_config = dict(
allow_respawn=False,
num_agents=num_agents,
crash_done=True,
neighbours_distance=40)
obs_space, act_space = get_obs_action_shape(env_config)
config = dict(
# ===== Environmental Setting =====
# We can grid-search the environmental parameters!
env=MultiAgentMetaDrive,
env_config=env_config,
# ===== Resource =====
num_gpus=0.25 if args.num_gpus != 0 else 0,
multiagent=dict(
policies={
"agent{}".format(i): PolicySpec(observation_space=obs_space["agent{}".format(i)],
action_space=act_space["agent{}".format(i)], config={"agent_id": i}) for
i in
range(num_agents)
},
policy_mapping_fn=lambda agent_id: agent_id
),
)
# Launch training
train(
MADDPGTrainer,
exp_name=exp_name,
keep_checkpoints_num=3,
stop=stop,
config=config,
num_gpus=args.num_gpus,
num_seeds=1,
test_mode=args.test,
custom_callback=MultiAgentDrivingCallbacks,
)
Most parts in this demo are for RLLib compatibility, the usage of RLLib can be found in Ray1.8. For MetaDrive, you only need to set the allow_respawn=False
to fix the agent number in the environment, and set the number of agents in the environment by num_agents=xxxyyyzzz
.
@QuanyiLi can you help me with drivingforce package. From where i can install it? from drivingforce.mapgdrive.callbacks import MultiAgentDrivingCallbacks
I am getting below error currently when running the demo code provided above No module named 'drivingforce'
@QuanyiLi can you help me with drivingforce package. From where i can install it? from drivingforce.mapgdrive.callbacks import MultiAgentDrivingCallbacks
I am getting below error currently when running the demo code provided above No module named 'drivingforce'
Instead of a runnable script, what I showed is an example which provides you the config for using ray/rllib. Besides, drivingforce is our private code base which is built on ray/rllib.
You can learn how to use MADDPG provided by ray from their document and example. And then you can train MADDPG in MetaDrive by simply replacing the config and environment in the example.
Could you please kindly provide a baseline example of training multi agent rl with MADDPG as you did here https://github.com/decisionforce/metadrive/blob/main/metadrive/examples/train_generalization_experiment.py ( drivingforce is not involved)
Actually, one of our project provides a clean Multi-agent interface without the drivingforce dependency. Named CoPO, this repo provides several self-interested MARL baselines, except MADDPG. Since it is compatible with the ray/rllib, you can easily modify the code in this repo to replace trainer to MADDPG from ray/rllib, and then run it.
Hello @QuanyiLi as you suggested i have installed Copo repo and completed the training for intersection envirorment using Copo algo and checkpoints are generated in the folder as below.
In the viz.py i can see it is loading .npz from best_checkpoints folder. But I am unable to load my checkpoints generated. Should i have to convert them to .npz? Can you please help me with the evaluation.
@pengzhenghao Please help Amara solve this problem
Hi @dineshresearch , we prepare a script to compress the model into numpy format.
metadrive/examples/ppo_expert/remove_useless_state.py
Please refer to following code snippet to do the conversion:
"""
This script is used to remove the optimizer state in the checkpoint. So that we can compress 2/3 of the checkpoint size.
This script is put here for reference only. In formal release, the original checkpoint file will be removed so
this script will become not runnable.
"""
import os.path as osp
import pickle
import numpy as np
# The absolute path to the checkpoint-XXX file.
ckpt_path = osp.join(osp.dirname(__file__), "checkpoint_417/checkpoint-417")
if __name__ == '__main__':
remove_value_network = True
path = "expert_weights.npz"
with open(ckpt_path, "rb") as f:
data = f.read()
unpickled = pickle.loads(data)
worker = pickle.loads(unpickled.pop("worker"))
if "_optimizer_variables" in worker["state"]["default_policy"]:
worker["state"]["default_policy"].pop("_optimizer_variables")
pickled_worker = pickle.dumps(worker)
weights = worker["state"]["default_policy"]
if remove_value_network:
weights = {k: v for k, v in weights.items() if "value" not in k}
np.savez_compressed(path, **weights)
print("Numpy agent weight is saved at: {}!".format(path))
It seems that we can abstract this function as a utility of metadrive. I will do this later.
ran the above script with generated ckpts but I am getting below error @pengzhenghao
I am using Ray 1.2.0 version
If I remember correctly, the weights of CoPO agent is with name default
. So we need to collect the weights by:
weights = worker["state"]["default"]
You can print the keys to verify this:
print(worker["state"].keys())
@pengzhenghao Thanks a lot now i am able to convert the model to .npz and then run the viz.py but i have the following questions in viz.py
Question 1: How to change the number of vehicles (from 67 to 4)? Question 2: How to change the velocity of individual vehicles?
Also when i am trying to run the https://github.com/decisionforce/CoPO/blob/main/copo_code/copo/eval/evaluate_population.py I am getting the below error
Finish 1 episodes with 32.060 s!
Traceback (most recent call last):
File "evaluate_population.py", line 164, in
@pengzhenghao @QuanyiLi any updates regarding the above questions that I have posted kindly
@QuanyiLi @pengzhenghao I have figured out how to change the number of agents but not able to change the velocity of individual agents during evaluation. Kindly help
Hi @dineshresearch
I am a little confused on "how to change the velocity of individual agents during evaluation". We are using RL policy to control agents and their velocity are determined by their acceleration in previous steps. So what do you mean by "change the velocity"?
Also, I update CoPO repo recently. I want to know do you have any question currently? Thanks!
Sure got it. Thanks for response
@pengzhenghao @QuanyiLi Please answer soon