proroklab / VectorizedMultiAgentSimulator

VMAS is a vectorized differentiable simulator designed for efficient Multi-Agent Reinforcement Learning benchmarking. It is comprised of a vectorized 2D physics engine written in PyTorch and a set of challenging multi-robot scenarios. Additional scenarios can be implemented through a simple and modular interface.
https://vmas.readthedocs.io
GNU General Public License v3.0
350 stars 71 forks source link

RL algorithms and their inputs #36

Closed menichel closed 1 year ago

menichel commented 1 year ago

Hi all,

I was wondering if the PPO-based MARL algorithms you use in the paper are taken from RLlib or whether they are already available in the library without the need of an RLlib interface.

I also have a question regarding the inputs of the NN. Do you use CNN like in the Atari games? In the paper you mention known information from the neighborhood; how is the shape and size of the neighborhood customizable and is the known info wrt to the agent or to a global frame? I imagine that it is customizable but I would like to know how it is implemented now to understand better what I am seeing.

Thanks and kind regards,

menichel

matteobettini commented 1 year ago

Hello,

Thanks for the interest!

CPPO is the default rllib PPO which is treating all agents as one. IPPO and MAPPO are done using our custom trainer available at https://github.com/proroklab/rllib_differentiable_comms. You can see https://github.com/proroklab/HetGPPO to see how to use the trainer in vmas.

In each scenario, there is an observation function that returns the observations for each agent.

It has the form:

def observation(agent):
  return torch.cat([agent.state.pos, agent.state.vel, ...])

If you want pixels you can return pixels in there or anything you want. In case you want global observation for each agent you can return the same global obs for each agent.

Communication of observation is not part of the paper. But if you are curious how we do comms is via a GNN. So we return local observations in the function such as position. and then in the neural network we build the agent graph which determines which neigbhour information you can aggregate over.