scalable multi agent reinforcement learning. Details can be found in the Report
An alternative of Gym environment is created (env.py). The rendering implementation of the envrionment is matplot. So, it would be much easier to use. However, you need to implement the prey policy by yourself.
Here, we have done two independent runs. In each run, from episode 1 to episodes 3x10^4, three agents were in the game. At episode 3x10^4, we added three more agents into this game. Here we show the mean Q value of all the agents in our experiments.
In this demon, the prey walks randomly. Agents learn to catch the prey.