Open fourpenny opened 5 hours ago
Hello,
Have you read this? https://github.com/proroklab/VectorizedMultiAgentSimulator/issues/62#issuecomment-1781214094
It should have the answer
Vmas follows the logic of the original MPE repository while PettingZoo changed a few things
also i think you misunderstood the vmas code: We are not using the position of the first agent only, we are just computing the reward on the first call of the function
The reward function for agents in the VMAS version of the Simple Reference scenario from the MPE differs from the current implementation in Petting Zoo.
Petting Zoo either penalizes individual agents for their distance from their corresponding landmark or returns an average of these penalties across all agents.
In VMAS, the reward is calculated only based on the position of the first agent in the environment's distance from landmarks.
Is this difference intentional? It seems like this implementation would make it difficult for both agents to learn to approach their corresponding landmark.