sleyzerzon / soar

Automatically exported from code.google.com/p/soar
1 stars 0 forks source link

add an rl parameter to turn off discounting during gaps #59

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
If gaps happen between RL decisions, the current behavior of Soar (if
temporal-extension is on) is to discount rewards and the propagated q-value
from the next state by the length of the gap. 

This all makes sense in the context of Soar, but can make it difficult to
implement agents for evaluation purposes, since there is no easy way to
make a textbook RL agent where behavior doesn't depend on how many Soar
decisions happen between RL actions.

I added an rl parameter, temporal-discount, that can be used to disable
discounting based on decisions between RL updates. A patch is attached.

Original issue reported on code.google.com by sam.wint...@gmail.com on 3 Dec 2009 at 4:38

Attachments:

GoogleCodeExporter commented 8 years ago

Original comment by voigtjr@gmail.com on 23 Feb 2010 at 7:39

GoogleCodeExporter commented 8 years ago
Patch applied.

Original comment by voigtjr@gmail.com on 2 Mar 2010 at 8:39