sjtu-marl / malib

A parallel framework for population-based multi-agent reinforcement learning.
https://malib.io
MIT License
498 stars 60 forks source link

'List' from 'malib.utils.typing' #46

Closed josyulakrishna closed 2 years ago

josyulakrishna commented 2 years ago

Hi I am trying to run a basic MARL setup using MAPPO.

Here's my yaml config file

name: "mappo_payload_carry"

training:
  interface:
    type: "centralized"
    population_size: -1
  config:
    # control the frequency of remote parameter update
    update_interval: 1
    saving_interval: 100
    batch_size: 32
    optimizer: "Adam"
    actor_lr: 5.e-4
    critic_lr: 5.e-4
    opti_eps: 1.e-5
    weight_decay: 0.0

rollout:
  type: "async"
  stopper: "simple_rollout"
  stopper_config:
    max_step: 10000
  metric_type: "simple"
  fragment_length: 100
  num_episodes: 4
  episode_seg: 1
  terminate: "any"
  num_env_per_worker: 1
  postprocessor_types:
    - copy_next_frame

env_description:
  #  scenario_name: "simple_spread"
  creator: "Gym"
  config:
    env_id: "urdf-env-v0"

algorithms:
  MAPPO:
    name: "MAPPO"
    model_config:
      initialization:
        use_orthogonal: True
        gain: 1.
      actor:
        network: mlp
        layers:
          - units: 256
            activation: ReLU
          - units: 128
            activation: ReLU
          - units: 64
            activation: ReLU
        output:
          activation: False
      critic:
        network: mlp
        layers:
          - units: 256
            activation: ReLU
          - units: 128
            activation: ReLU
          - units: 64
            activation: ReLU
        output:
          activation: False

    # set hyper parameter
    custom_config:
      gamma: 0.99
      use_cuda: False  # enable cuda or not
      use_q_head: False
      ppo_epoch: 4
      num_mini_batch: 1  # the number of mini-batches

      return_mode: gae
      gae:
        gae_lambda: 0.95
      vtrace:
        clip_rho_threshold: 1.0
        clip_pg_rho_threshold: 1.0

      use_rnn: False
      # this is not used, instead it is fixed to last hidden in actor/critic
      rnn_layer_num: 1
      rnn_data_chunk_length: 16

      use_feature_normalization: True
      use_popart: True
      popart_beta: 0.99999

      entropy_coef: 1.e-2

global_evaluator:
  name: "generic"

dataset_config:
  episode_capacity: 100
  fragment_length: 3001```

I have a custom environment where I created the env. 
env = gym.make("urdf-env-v0", dt=0.01, robots=robots, render=render)
possible_agents = env.possible_agents
action_spaces = env.possible_actions
observation_spaces = env.observation_spaces
env_desc = {"creator": env, "possible_agents": possible_agents, "action_spaces": action_spaces, "observation_spaces": observation_spaces}
run(
    group=config["group"],
    name=config["name"],
    env_description=env_desc,
    agent_mapping_func=lambda agent: agent[
                                     :6
                                     ],  # e.g. "team_0_player_0" -> "team_0"
    training=training_config,
    algorithms=config["algorithms"],
    rollout=rollout_config,
    evaluation=config.get("evaluation", {}),
    global_evaluator=config["global_evaluator"],
    dataset_config=config.get("dataset_config", {}),
    parameter_server=config.get("parameter_server", {}),
    # worker_config=config["worker_config"],
    use_init_policy_pool=False,
    task_mode="marl",
)

I tried to see if malib.utils.typing had the List, Dict types, but it looks like they're non existent there, how do I fix this?

josyulakrishna commented 2 years ago

adding the line

from typing import List

resolves this issues

KornbergFresnel commented 2 years ago

@josyulakrishna MAPPO is still in debugging, you can follow the changes in branch test-cases.

josyulakrishna commented 2 years ago

Thank you for the response @KornbergFresnel is there any example of MADDPG or QMIX on a custom gym-based environment, which I can readily use and understand the library?

I have checked the test-cases branch, and none of the examples really worked for me, I'm not sure what I'm doing wrong. It looks like the project is having several branches and its varied across branches, which is very confusing to me.

Any help is appreciated. Thank you.

KornbergFresnel commented 2 years ago

@josyulakrishna I've just run the cases under examples, there are no errors raised. Do not use any learner other than "Independent Learner," as the lib is still in reconstruction. Both MADDPG and QMIX have also been removed temporarily from our lib. I think the standard version will be ready this month.

josyulakrishna commented 2 years ago

I see thank you for the reply @KornbergFresnel