Closed agarwl closed 4 years ago
If you would like to always get the same state on reset, you should set num-levels to 1. For example:
python -m coinrun.train_agent --run-id myrun --num-levels 1 --set-seed 13
The same flag will also work for the evaluation script. When you set num-levels to 1, the game state on reset will be independent of previous actions taken. You can change the particular level used by changing the value of set-seed.
@kcobbe I should have been more clear, even when num-level
is set to 1, the reset states returned are different for the single environment and depend on the sequence of actions.
For example, these are the observations returned by the environment when the done
flag is True
(here rep
is set to 3). These observations are visibly different.
Config used:
frame_stack: 1
game_type: standard
high_difficulty: False
is_high_res: False
num_envs: 32
num_eval: 1
num_levels: 1
paint_vel_info: -1
rep: 3
set_seed: -1
test: False
test_eval: False
train_eval: True
use_batch_norm: 0
use_black_white: 0
use_color_transform: 0
use_data_augmentation: 0
use_inversion: 0
I have been trying to collect demonstrations from a trained
PPO
using a fixed coinrun environment (assume that thelevel_seed
is set to the same value wheneverdone
is True) , however, it seems that reset state of the environment depends on the actions executed before it.Specifically, consider the
rep
value is set to 3, then the two otherresets
except 1st require the exact sequence of actions to be repeated from a newly instantiatedenv
to get to the exact resetstate
. Due to this behavior, a long sequence of actions leading to multiple resets and rewards (whenever we solve theenv
) can't be split up into multiple demonstrations for thatenv
and has to be used as single stream of experience.Is there any change that I can make to
coinrunenv.py
to get around this problem?