I'm proposing here three changes to RL-parameter defaults/behavior:
1. Stop indifferent-selection switch
Currently, the first time RL is enabled, Soar will change the
indifferent-selection policy automatically to epsilon-greedy. There is
a message to trace, and arguably this is helpful for simple agents,
but it has caused us grief during experimentation due to timing of
setting indifferent-selection parameters vs. enabling RL. I think
these should be absolutely independent commands. This will necessarily
hurt backwards compatibility (e.g. will have to amend all demo RL
agents and the tutorial).
2. HRL-Discount -> off (default)
By default, Soar discounts updates over subgoals as though they were
gaps. I think we have found that for many experiments, this behavior
makes hierarchical learning slow, and so we should disable by default.
For the time being, however, I think we should keep the parameter,
allowing for easy experimentation.
3. Chunk-Stop -> on (default)
By default, Soar will currently create a chunk that only differs from
another rule in the value of the numeric indifferent preference. The
chunk-stop parameter catches this situation and prevents the creation
of the second chunk. I think for RL, it makes sense to enable this
behavior by default. However, there might be other situations where
multiple chunks are used as "counters" and perhaps this will hurt
backwards compatibility.
Original issue reported on code.google.com by nate.der...@gmail.com on 28 Dec 2011 at 2:19
Original issue reported on code.google.com by
nate.der...@gmail.com
on 28 Dec 2011 at 2:19