Open mrzhuzhe opened 2 years ago
The goal of the smaller observation space is to introduce partial observability, so that agents are forced to learn collaboration and long-range dependencies. In this case, agents should learn to spread out and protect the entire end line collaboratively.
If you'd like to modify the environment observations, you can do so, but I suspect this will make it much easier to learn, to the point where individual agents just start hunting zombies without any collaborative behavior.
In knight archer zombie env I train default agent done and see such behaves:
this may be because each agent only see 512 x 512 and when they spawn they stay too close : init position (400 410) (400 460) (400 610) (460 660) but whole env is (1280 720)
so they only can see about half environment if a zombie reach end line in another half , they could not see anything
so in early game agents almost get "sudden death" without known any information this result in very large noise and variance in gradient
to fix this we may give agent more explict imformation to known games rule and environment