Closed Hvass-Labs closed 7 years ago
You can get at the underlying Atari emulator with env.ale
. The emulator is (somewhat) documented at https://github.com/bbitmaster/ale_python_interface/wiki. env.ale.lives()
should get you lives left for most games that have lives. (Pong: no, Asteroids: yes).
To save and restore, use s = env.ale.cloneState()
, and env.ale.restoreState(s)
.
These interfaces aren't part of gym, and it's easy to crash the emulator by using loadState wrong.
It's reasonable to peek behind the covers when training special-purpose agents. But make sure not to use them when evaluating trained performance.
Thanks for the VERY quick response. I'll take a look.
hello @tlbtlbtlb ,
you actually mean env.ale.restoreState(s)
instead of env.ale.loadState(s)
, right?
You're right, env.ale.restoreState(s)
is correct. I'll edit the original comment too.
@tlbtlbtlb I tested with Breakout-v4 and found that the env.ale.lives() will not change instantly after losing a life (i.e. the ball disappears from the screen). Instead, the value will change roughly 8 steps after a life is lost. Could you please verify what I said? If what I said is true, is it intentional or an issue? Thanks.
It uses the code of the original games, written in the 1970s. env.ale.lives()
reflects a memory location that the game uses internally. The games were written for humans to play, so often incorporate delays. That's part of what makes them challenging to write agents for.
I have two questions / suggestions that I cannot find the answer to anywhere in the Gym docs or using google search.
1) Is it possible to access a life-counter or receive some signal when there is a loss of life in an Atari game? The terminal boolean from
env.step()
only signals end of episode, that is, all lives have been lost. It might help with training if the loss of each life could be taken into account. There is aninfo
dict returned byenv.step()
but it seems to be empty.2) Is it possible to save and reload an Atari game during play? Or is it technically feasible for you to implement this? This would be very useful for training an agent how to avoid dying, and also more efficiently how to score points. You could save the game e.g. every N calls to
env.step()
and when the agent dies, instead of restarting from scratch, you reload the last saved game. This would allow the agent to learn how to avoid dying, without wasting a lot of time getting to the near-death state again.Before somebody comments that this is not how a human learns, let me say that it is hardly relevant, because humans don't use RL algorithms and Artificial Neural Networks. We are trying to create artificial agents that perform well. In some tasks this 'cheating' would be acceptable - in other tasks it is not. Furthermore, in high-precision tasks, this is actually how humans learn, by repeating the challenging aspects of a task over and over until it becomes fluent. An example is a musician who will focus their rehearsal on the tricky parts of a score and practice that repeatedly, instead of starting the entire score from the beginning on every single mistake.
Thanks.