hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.16k stars 725 forks source link

Is it wrong to reward an action on the next step? #1158

Closed DaniilKardava closed 2 years ago

DaniilKardava commented 2 years ago

Sorry if this is a silly question, when I call step and pass a certain action, the reward calculated during that step is associated with the current action and observation right? So if I need to reward the agent for a successful decision, I would need to look forward and reward it immediately rather than reward it later and associate that reward with the current action taking place? Thank you. Edit: would it be better for me to post future question under sb3? i am currently using an older version because i wasn't able to use lstm.

Miffyli commented 2 years ago

Yes, you can absolutely reward the agent "later"! One of the core advantages (or problems being tackled by) RL is reward assignment, where the algorithm tries to figure out which actions were responsible for the good reward :)

And yes, in future, please ask on SB3 issues. Also, for these type of questions (not issues or enchantments), try asking on other forums such as the RL Discord.