Investigate why Pac-Man isn't learning with basic Q-learning

matheusportela commented 7 years ago

Apparently, even in cart pole scenario, the results aren't being consistent, since testing games score much less than learning ones.

Some possible causes:

Wrong implementation of Q-learning
State space too large
Insufficient information in state
Misleading reward function
Improper learning rate and discount rate values

Skalwalker commented 7 years ago

Could this be happening to ghost agents too?

matheusportela commented 7 years ago

@Skalwalker yes, it could, although my guess is that the state space is too large for the scenario where the Pac-Man is alone in the field (which I used to run this test). When testing with the cart-pole scenario, the agent could start learning something only after I drastically reduced the state space to a couple hundred possible states.

matheusportela commented 7 years ago

Small update on this task: Q-learning is working quite well with the cart-pole experiment. After about 500 simulations, the agent learn to control the inverted pendulum for ~10 seconds and, 500 simulations later, it takes minutes until the pole falls.

cart_pole

The Pac-Man scenario with simple Q-learning doesn't show the same progress though. I've tried to reduce the state space by generating states that incorporate only three aspects:

X coordinate
Y coordinate
Whether this is the first time the agent is visiting this cell

I've just run 500 simulations and the agent did seem to have some progress about 300 games later, but it suddenly returns to the usual position. pacman

Based on these info, I'll try and review the reward function and run the Pac-Man simulation with different parameters. Without better results, I'll try to put some ghost information on the state (but aware that it might actually reduce the learning velocity, since the state space is going to enlarge).

matheusportela commented 7 years ago

One more thing to test: simply selecting behaviors instead of actions.

matheusportela / Multiagent-RL

Investigate why Pac-Man isn't learning with basic Q-learning #87