The current way multistep is handled doesn't make sense as the model only has access to the agent's actions in the first step, and asked to predict the return multiple steps into the future. Try incorporating all actions in the window and unrolling somehow to improve.
The current way multistep is handled doesn't make sense as the model only has access to the agent's actions in the first step, and asked to predict the return multiple steps into the future. Try incorporating all actions in the window and unrolling somehow to improve.