Closed N-Kingsley closed 5 years ago
Yes, it is redundant. In fact, if config.NSTEP > 1 it may break this code
Yes, I think it's incorrect if if config.NSTEP > 1, too.
And if I change the config.SEQUENCE_LENGTH=10, then is the code still feasible?
Yes, config.SEQUENCE_LENGTH can be increased to 10. However, it should be noted that this may require tuning other hyperparameters as well to maintain reasonable performance.
Thanks. In 'wrap_deepmind.py', done is True when losing a life, and we will reset reward to 0. Then in a multi-life environment, will this lead to a reduction in rewards?
Should we do ‘wrap_deepmind(env, episode_life=False)’ during the test?
@N-Kingsley setting episode life=True could have implications on what the agent learns in training; however, in Pong it shouldn't prevent the agent from learning the optimal policy. During evaluation, it shouldn't matter whether episode life is true or false.
In DRQN.ipynb, if config.NSTEP is equal to 1, then is this step that 'non_final_next_states = torch.cat([batch_state[non_final_mask, 1:, :], non_final_next_states], dim=1)' redundant?