Support: Do metadrive support multi agent RL algorithms particularly MADDPG? Did any one tested?

dineshresearch commented 3 years ago

@pengzhenghao @QuanyiLi Please answer soon

QuanyiLi commented 3 years ago

Hi,

Thanks for your interest. Though multi-agent MetaDrive environments are mainly designed for self-interested MARL, it also supports traditional MARL training.

For MADDPG, we already tested it using RLLib implementation. Here we provide a code demo for you:

from metadrive.envs.marl_envs.marl_inout_roundabout import MultiAgentRoundaboutEnv
from ray import tune
from drivingforce.mapgdrive.callbacks import MultiAgentDrivingCallbacks
from ray.rllib.contrib.maddpg.maddpg import MADDPGTrainer
from ray.rllib.env import MultiAgentEnv
from drivingforce.train import train, get_train_parser
from ray.rllib.policy.policy import PolicySpec

class MultiAgentMetaDrive(MultiAgentRoundaboutEnv, MultiAgentEnv):
    def step(self, *args, **kwargs):
        s, r, d, i = MultiAgentRoundaboutEnv.step(self, *args, **kwargs)
        d["__all__"] = all([done for done in d.values()])
        return s, r, d, i

def get_obs_action_shape(env_config):
    assert env_config is not None
    single_env = MultiAgentRoundaboutEnv(env_config)
    obs_space = single_env.observation_space
    act_space = single_env.action_space
    single_env.close()
    return obs_space, act_space

if __name__ == "__main__":
    args = get_train_parser().parse_args()
    exp_name = args.exp_name or "TEST"
    stop = int(400_0000)
    num_agents = 40
    env_config = dict(
        allow_respawn=False,
        num_agents=num_agents,
        crash_done=True,
        neighbours_distance=40)
    obs_space, act_space = get_obs_action_shape(env_config)

    config = dict(
        # ===== Environmental Setting =====
        # We can grid-search the environmental parameters!
        env=MultiAgentMetaDrive,
        env_config=env_config,

        # ===== Resource =====
        num_gpus=0.25 if args.num_gpus != 0 else 0,
        multiagent=dict(
            policies={
                "agent{}".format(i): PolicySpec(observation_space=obs_space["agent{}".format(i)],
                                                action_space=act_space["agent{}".format(i)], config={"agent_id": i}) for
                i in
                range(num_agents)
            },
            policy_mapping_fn=lambda agent_id: agent_id
        ),
    )

    # Launch training
    train(
        MADDPGTrainer,
        exp_name=exp_name,
        keep_checkpoints_num=3,
        stop=stop,
        config=config,
        num_gpus=args.num_gpus,
        num_seeds=1,
        test_mode=args.test,
        custom_callback=MultiAgentDrivingCallbacks,
    )

Most parts in this demo are for RLLib compatibility, the usage of RLLib can be found in Ray1.8. For MetaDrive, you only need to set the allow_respawn=False to fix the agent number in the environment, and set the number of agents in the environment by num_agents=xxxyyyzzz.

dineshresearch commented 2 years ago

@QuanyiLi can you help me with drivingforce package. From where i can install it? from drivingforce.mapgdrive.callbacks import MultiAgentDrivingCallbacks

I am getting below error currently when running the demo code provided above No module named 'drivingforce'

QuanyiLi commented 2 years ago

@QuanyiLi can you help me with drivingforce package. From where i can install it? from drivingforce.mapgdrive.callbacks import MultiAgentDrivingCallbacks

I am getting below error currently when running the demo code provided above No module named 'drivingforce'

Instead of a runnable script, what I showed is an example which provides you the config for using ray/rllib. Besides, drivingforce is our private code base which is built on ray/rllib.

You can learn how to use MADDPG provided by ray from their document and example. And then you can train MADDPG in MetaDrive by simply replacing the config and environment in the example.

dineshresearch commented 2 years ago

Could you please kindly provide a baseline example of training multi agent rl with MADDPG as you did here https://github.com/decisionforce/metadrive/blob/main/metadrive/examples/train_generalization_experiment.py ( drivingforce is not involved)

QuanyiLi commented 2 years ago

Actually, one of our project provides a clean Multi-agent interface without the drivingforce dependency. Named CoPO, this repo provides several self-interested MARL baselines, except MADDPG. Since it is compatible with the ray/rllib, you can easily modify the code in this repo to replace trainer to MADDPG from ray/rllib, and then run it.

dineshresearch commented 2 years ago

Hello @QuanyiLi as you suggested i have installed Copo repo and completed the training for intersection envirorment using Copo algo and checkpoints are generated in the folder as below.

In the viz.py i can see it is loading .npz from best_checkpoints folder. But I am unable to load my checkpoints generated. Should i have to convert them to .npz? Can you please help me with the evaluation.

QuanyiLi commented 2 years ago

@pengzhenghao Please help Amara solve this problem

pengzhenghao commented 2 years ago

Hi @dineshresearch , we prepare a script to compress the model into numpy format.

metadrive/examples/ppo_expert/remove_useless_state.py

Please refer to following code snippet to do the conversion:

"""
This script is used to remove the optimizer state in the checkpoint. So that we can compress 2/3 of the checkpoint size.
This script is put here for reference only. In formal release, the original checkpoint file will be removed so
this script will become not runnable.
"""
import os.path as osp
import pickle

import numpy as np

# The absolute path to the checkpoint-XXX file.
ckpt_path = osp.join(osp.dirname(__file__), "checkpoint_417/checkpoint-417")
if __name__ == '__main__':
    remove_value_network = True
    path = "expert_weights.npz"

    with open(ckpt_path, "rb") as f:
        data = f.read()
    unpickled = pickle.loads(data)
    worker = pickle.loads(unpickled.pop("worker"))
    if "_optimizer_variables" in worker["state"]["default_policy"]:
        worker["state"]["default_policy"].pop("_optimizer_variables")
    pickled_worker = pickle.dumps(worker)
    weights = worker["state"]["default_policy"]
    if remove_value_network:
        weights = {k: v for k, v in weights.items() if "value" not in k}
    np.savez_compressed(path, **weights)
    print("Numpy agent weight is saved at: {}!".format(path))

It seems that we can abstract this function as a utility of metadrive. I will do this later.

dineshresearch commented 2 years ago

ran the above script with generated ckpts but I am getting below error @pengzhenghao

I am using Ray 1.2.0 version

pengzhenghao commented 2 years ago

If I remember correctly, the weights of CoPO agent is with name default. So we need to collect the weights by:

weights = worker["state"]["default"]

You can print the keys to verify this:

print(worker["state"].keys())

dineshresearch commented 2 years ago

@pengzhenghao Thanks a lot now i am able to convert the model to .npz and then run the viz.py but i have the following questions in viz.py

Question 1: How to change the number of vehicles (from 67 to 4)? Question 2: How to change the velocity of individual vehicles?

Also when i am trying to run the https://github.com/decisionforce/CoPO/blob/main/copo_code/copo/eval/evaluate_population.py I am getting the below error

Finish 1 episodes with 32.060 s! Traceback (most recent call last): File "evaluate_population.py", line 164, in auto_add_svo_to_obs=not args.no_auto_add_svo_to_obs File "evaluate_population.py", line 83, in evaluate_once raise e File "evaluate_population.py", line 69, in evaluate_once res = env.get_episode_result() File "/home/rnd-kdinesh1/Desktop/Dinesh_San/METADRIVE/CoPO/copo_code/copo/eval/recoder.py", line 222, in get_episode_result ret["coll_step_mean_episode_min"] = np.min(step_means) File "<__array_function__ internals>", line 6, in amin File "/home/rnd-kdinesh1/anaconda3/envs/copo/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2793, in amin keepdims=keepdims, initial=initial, where=where) File "/home/rnd-kdinesh1/anaconda3/envs/copo/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 90, in _wrapreduction return ufunc.reduce(obj, axis, dtype, out, **passkwargs) ValueError: zero-size array to reduction operation minimum which has no identity

dineshresearch commented 2 years ago

@pengzhenghao @QuanyiLi any updates regarding the above questions that I have posted kindly

dineshresearch commented 2 years ago

@QuanyiLi @pengzhenghao I have figured out how to change the number of agents but not able to change the velocity of individual agents during evaluation. Kindly help

pengzhenghao commented 2 years ago

Hi @dineshresearch

I am a little confused on "how to change the velocity of individual agents during evaluation". We are using RL policy to control agents and their velocity are determined by their acceleration in previous steps. So what do you mean by "change the velocity"?

Also, I update CoPO repo recently. I want to know do you have any question currently? Thanks!

dineshresearch commented 2 years ago

Sure got it. Thanks for response

metadriverse / metadrive

Support: Do metadrive support multi agent RL algorithms particularly MADDPG? Did any one tested? #118