Is it wrong to reward an action on the next step?

hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

MIT License

4.16k stars 725 forks source link

Sorry if this is a silly question, when I call step and pass a certain action, the reward calculated during that step is associated with the current action and observation right? So if I need to reward the agent for a successful decision, I would need to look forward and reward it immediately rather than reward it later and associate that reward with the current action taking place? Thank you. Edit: would it be better for me to post future question under sb3? i am currently using an older version because i wasn't able to use lstm.

hill-a / stable-baselines

Is it wrong to reward an action on the next step? #1158