Closed matheusportela closed 1 year ago
Could this be happening to ghost agents too?
@Skalwalker yes, it could, although my guess is that the state space is too large for the scenario where the Pac-Man is alone in the field (which I used to run this test). When testing with the cart-pole scenario, the agent could start learning something only after I drastically reduced the state space to a couple hundred possible states.
Small update on this task: Q-learning is working quite well with the cart-pole experiment. After about 500 simulations, the agent learn to control the inverted pendulum for ~10 seconds and, 500 simulations later, it takes minutes until the pole falls.
The Pac-Man scenario with simple Q-learning doesn't show the same progress though. I've tried to reduce the state space by generating states that incorporate only three aspects:
I've just run 500 simulations and the agent did seem to have some progress about 300 games later, but it suddenly returns to the usual position.
Based on these info, I'll try and review the reward function and run the Pac-Man simulation with different parameters. Without better results, I'll try to put some ghost information on the state (but aware that it might actually reduce the learning velocity, since the state space is going to enlarge).
One more thing to test: simply selecting behaviors instead of actions.
Apparently, even in cart pole scenario, the results aren't being consistent, since testing games score much less than learning ones.
Some possible causes: