michaelkoelle / marl-aquarium

Aquarium: A Comprehensive Framework for Exploring Predator-Prey Dynamics through Multi-Agent Reinforcement Learning Algorithms
MIT License
4 stars 4 forks source link

assert problem like :"assert len(observations) == self.number_of_predator_observations" #2

Open kudouxiao opened 5 months ago

kudouxiao commented 5 months ago

The internal logic of this environment seems to have issues. I want to increase the number of predators but need to modify a large number of built-in parameters. After the modifications, the program can only run for a while and cannot run for a large number of episodes. Even when I try to reproduce the situation in your paper by using the default parameters or adjusting the number of prey to 8, I still encounter issues like "assert len(observations) == self.number_of_predator_observations" after running for a while. I have tried using algorithms like MADDPG, PPO, and MATD3, and they all exhibit the same problem. In other words, whenever the running steps per episode are longer, such issues inevitably arise, preventing me from adequately training the agents. Do you have any suggestions for resolving this issue? Is it possible to reproduce the scenario presented in your paper? image

kudouxiao commented 4 months ago

The issue I was facing has already been resolved, and I am keeping the question open in the hope that it will be helpful to others. In your environmental source code, there is a mistake in the parenthesis placement in this line: "observations += [0] self.obs_size (n_nearest_shark - len(observations))". It should be corrected to "observations += [0] (self.obs_size n_nearest_shark - len(observations))". Moreover, according to the paper, the way prey and predators return observations should be the same, meaning their logic for observing the environment is identical. However, in the source code (which seems to have been written by different authors), the logic for predators observing the environment appears to be different from that of the prey, which could potentially lead to errors during prolonged training. Therefore, it is recommended to standardize their methods of observing the environment.

michaelkoelle commented 4 months ago

Thank you for bringing this to our attention. We are currently actively working on the v1.0 version where this will be fixed. After our experiments in the paper we did a major refactor of the whole environment which broke many things unfortunately.