add an rl parameter to turn off discounting during gaps

GoogleCodeExporter commented 8 years ago

If gaps happen between RL decisions, the current behavior of Soar (if
temporal-extension is on) is to discount rewards and the propagated q-value
from the next state by the length of the gap. 

This all makes sense in the context of Soar, but can make it difficult to
implement agents for evaluation purposes, since there is no easy way to
make a textbook RL agent where behavior doesn't depend on how many Soar
decisions happen between RL actions.

I added an rl parameter, temporal-discount, that can be used to disable
discounting based on decisions between RL updates. A patch is attached.

Original issue reported on code.google.com by sam.wint...@gmail.com on 3 Dec 2009 at 4:38

Attachments:

temporal-discount-patch

GoogleCodeExporter commented 8 years ago

Original comment by voigtjr@gmail.com on 23 Feb 2010 at 7:39

Added labels: Milestone-9.3.0

GoogleCodeExporter commented 8 years ago

Patch applied.

Original comment by voigtjr@gmail.com on 2 Mar 2010 at 8:39

Changed state: Fixed

sleyzerzon / soar

add an rl parameter to turn off discounting during gaps #59