qfettes / DeepRL-Tutorials

Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch
1.06k stars 323 forks source link

About code #6

Closed N-Kingsley closed 5 years ago

N-Kingsley commented 5 years ago

In DRQN.ipynb, if config.NSTEP is equal to 1, then is this step that 'non_final_next_states = torch.cat([batch_state[non_final_mask, 1:, :], non_final_next_states], dim=1)' redundant?

qfettes commented 5 years ago

Yes, it is redundant. In fact, if config.NSTEP > 1 it may break this code

N-Kingsley commented 5 years ago

Yes, I think it's incorrect if if config.NSTEP > 1, too.

And if I change the config.SEQUENCE_LENGTH=10, then is the code still feasible?

qfettes commented 5 years ago

Yes, config.SEQUENCE_LENGTH can be increased to 10. However, it should be noted that this may require tuning other hyperparameters as well to maintain reasonable performance.

N-Kingsley commented 5 years ago

Thanks. In 'wrap_deepmind.py', done is True when losing a life, and we will reset reward to 0. Then in a multi-life environment, will this lead to a reduction in rewards?

N-Kingsley commented 5 years ago

Should we do ‘wrap_deepmind(env, episode_life=False)’ during the test?

qfettes commented 5 years ago

@N-Kingsley setting episode life=True could have implications on what the agent learns in training; however, in Pong it shouldn't prevent the agent from learning the optimal policy. During evaluation, it shouldn't matter whether episode life is true or false.