MPE - Simple Reference Reward Question

proroklab / VectorizedMultiAgentSimulator

VMAS is a vectorized differentiable simulator designed for efficient Multi-Agent Reinforcement Learning benchmarking. It is comprised of a vectorized 2D physics engine written in PyTorch and a set of challenging multi-robot scenarios. Additional scenarios can be implemented through a simple and modular interface.

GNU General Public License v3.0

340 stars 68 forks source link

The reward function for agents in the VMAS version of the Simple Reference scenario from the MPE differs from the current implementation in Petting Zoo.

Petting Zoo either penalizes individual agents for their distance from their corresponding landmark or returns an average of these penalties across all agents.

In VMAS, the reward is calculated only based on the position of the first agent in the environment's distance from landmarks.

Is this difference intentional? It seems like this implementation would make it difficult for both agents to learn to approach their corresponding landmark.

proroklab / VectorizedMultiAgentSimulator

MPE - Simple Reference Reward Question #149