openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.
https://www.gymlibrary.dev
Other
34.53k stars 8.59k forks source link

Loss of Life - and Save/Reload Games #470

Closed Hvass-Labs closed 7 years ago

Hvass-Labs commented 7 years ago

I have two questions / suggestions that I cannot find the answer to anywhere in the Gym docs or using google search.

1) Is it possible to access a life-counter or receive some signal when there is a loss of life in an Atari game? The terminal boolean from env.step() only signals end of episode, that is, all lives have been lost. It might help with training if the loss of each life could be taken into account. There is an info dict returned by env.step() but it seems to be empty.

2) Is it possible to save and reload an Atari game during play? Or is it technically feasible for you to implement this? This would be very useful for training an agent how to avoid dying, and also more efficiently how to score points. You could save the game e.g. every N calls to env.step() and when the agent dies, instead of restarting from scratch, you reload the last saved game. This would allow the agent to learn how to avoid dying, without wasting a lot of time getting to the near-death state again.

Before somebody comments that this is not how a human learns, let me say that it is hardly relevant, because humans don't use RL algorithms and Artificial Neural Networks. We are trying to create artificial agents that perform well. In some tasks this 'cheating' would be acceptable - in other tasks it is not. Furthermore, in high-precision tasks, this is actually how humans learn, by repeating the challenging aspects of a task over and over until it becomes fluent. An example is a musician who will focus their rehearsal on the tricky parts of a score and practice that repeatedly, instead of starting the entire score from the beginning on every single mistake.

Thanks.

tlbtlbtlb commented 7 years ago

You can get at the underlying Atari emulator with env.ale. The emulator is (somewhat) documented at https://github.com/bbitmaster/ale_python_interface/wiki. env.ale.lives() should get you lives left for most games that have lives. (Pong: no, Asteroids: yes).

To save and restore, use s = env.ale.cloneState(), and env.ale.restoreState(s).

These interfaces aren't part of gym, and it's easy to crash the emulator by using loadState wrong.

It's reasonable to peek behind the covers when training special-purpose agents. But make sure not to use them when evaluating trained performance.

Hvass-Labs commented 7 years ago

Thanks for the VERY quick response. I'll take a look.

yenchenlin commented 7 years ago

hello @tlbtlbtlb ,

you actually mean env.ale.restoreState(s) instead of env.ale.loadState(s), right?

tlbtlbtlb commented 7 years ago

You're right, env.ale.restoreState(s) is correct. I'll edit the original comment too.

boranzhao commented 6 years ago

@tlbtlbtlb I tested with Breakout-v4 and found that the env.ale.lives() will not change instantly after losing a life (i.e. the ball disappears from the screen). Instead, the value will change roughly 8 steps after a life is lost. Could you please verify what I said? If what I said is true, is it intentional or an issue? Thanks.

tlbtlbtlb commented 6 years ago

It uses the code of the original games, written in the 1970s. env.ale.lives() reflects a memory location that the game uses internally. The games were written for humans to play, so often incorporate delays. That's part of what makes them challenging to write agents for.