opendilab / DI-engine

OpenDILab Decision AI Engine. The Most Comprehensive Reinforcement Learning Framework B.P.
https://di-engine-docs.readthedocs.io
Apache License 2.0
3.04k stars 370 forks source link

Observation shape in the custom marl environment #823

Closed WangJuan6 closed 2 months ago

WangJuan6 commented 2 months ago

Dear Author,

I want to use the mappo algorithm in the custom multi-agent environment. My observation consists of two parts: agent_state and global_state. Specifically: The dimensions of agent_state are given by H1 × W1 × C1. The dimensions of global_state are given by H2 × W2 × C2.

I noticed that the parameters of agent_obs_shape and global_obs_shape in MAVAC can be either integers or sequences:

https://github.com/opendilab/DI-engine/blob/7f951592a3d7b39c8fda44081fc40002f0ee27fc/ding/model/template/mavac.py#L43-L45

So, I set the observation shape in the config file:

model=dict(
            action_space='continuous',
            agent_num=n_agent,
            agent_obs_shape=[100, 100, 6],
            global_obs_shape=[100, 100, 6],
            action_shape=2,
        ),

However, the error:

Traceback (most recent call last):
  File "/data/projects/20240708/DI-engine/custom_procthor_env_hierarchy_top_view_multi_agent/procthor_env_ppo_subprocess_hierarchy.py", line 215, in <module>
    main()
  File "/data/projects/20240708/DI-engine/custom_procthor_env_hierarchy_top_view_multi_agent/procthor_env_ppo_subprocess_hierarchy.py", line 202, in main
    model = MAVAC(**cfg.policy.model)
  File "/data/projects/20240708/DI-engine/ding/model/template/mavac.py", line 84, in __init__
    nn.Linear(global_obs_shape, critic_head_hidden_size), activation,
  File "/home/wj/anaconda3/envs/di/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 96, in __init__
    self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
TypeError: empty(): argument 'size' must be tuple of SymInts, but found element of type tuple at pos 2

occurred in:

https://github.com/opendilab/DI-engine/blob/7f951592a3d7b39c8fda44081fc40002f0ee27fc/ding/model/template/mavac.py#L83-L88

when I use the observation shape mentioned above.

How can I utilize custom networks, or is there an alternative method for handling sequence-type inputs other than flattening the array to 1D?

Thank you for your assistance!

Best regards, Juan

PaParaZz1 commented 2 months ago

You can custom your own networks, and pass it into the input argument of DI-engine's policy. Here is the concrete implementation of this argument.

WangJuan6 commented 2 months ago

Hi, @PaParaZz1 Thank you for your reply. Is there a plan to set encoder parameters in MAVAC that are similar to how it’s done in VAC? https://github.com/opendilab/DI-engine/blob/7f951592a3d7b39c8fda44081fc40002f0ee27fc/ding/model/template/vac.py#L43 I think this would be convenient for network customization. Thanks, Juan

PaParaZz1 commented 2 months ago

We will add this argument like that in VAC in a week.

WangJuan6 commented 2 months ago

Thanks, that’s good news! I’m looking forward to the update.

PaParaZz1 commented 2 months ago

We have added this feature in the above commit.