Port changes from CSoar revision 13602

marinier commented 11 years ago

Log: [No log message] http://code.google.com/p/soar/source/detail?r=13602

BobM note: this appears to be several things: typo fixes in comments, removal of Soar 7 cruft, and the start of CDPS (whatever that is). I've pinged mazin for more details. It's possible that this is not as bad as it looks, since much of the Soar 7 cruft has already been removed from jsoar.

marinier commented 11 years ago

Notes from Mazin:

Sorry for the lack of comment on that one. I remember I had a conflict with that merge and may have had to do it manually and probably overlooked the comment while trying to figure it out. Anyway, 13602 was huge because it was the merging into the trunk of the branch with the new context dependent preference set code. I don't know if there's a way to copy over revision comments from another branch when you merge, but they probably wouldn't be of all that much use. To summarize it, it's almost all the addition of the CDPS, but, during its development, there was a bunch of code clean up and removal of logic related to deprecated parts of soar, for example the parallel preference code. I also added more comments to certain parts.

The best way to understand what the CDPS is probably just to read the new section in the manual about it. I'll paste it in here. Once we finalized the chapter, we were planning on announcing it to the Soar group, but it's been on the backburner. Anyway, here's the section (fyi, there have been a few minor changes to the logic since this was written):

4.3 The Context-Dependent Preference Set

As described in the beginning of this chapter, chunking summarizes the processing required to produce the results of sub-goals. Traditionally, the philosophy behind how an agent should be designed was that the path of operator selections and applications from an initial state in a sub-state to a result would always have all necessary tests in the operator proposal conditions and any goal test, so only those items would need to be summarized. The idea was that in a properly designed agent, a sub-state’s operator evaluation preferences lead to a more efficient search of the space but do not influence the correctness of the result. As a result, the knowledge used by rules that produce such evaluation preferences should not be included in any chunks produced from that sub-state.

In practice, however, a Soar program can be written so that search control does affect the correctness of search. A few examples can be found in Section 4.7 on page 80. Moreover, reinforcement learning rules are often used to direct the search to those states that provide more reward, consequently making the idea of correctness much fuzzier. As a result, there may be cases when it is useful to encode goal-attainment knowledge into operator evaluation rules. Unfortunately, chunks created in such a problem space will be overgeneral because important parts of the superstate that were tested by operator evaluation rules do not appear as conditions in the chunks that summarize the processing in that problem state. The context-dependent preference set is a way to address this issue.

The context-dependent preference set (CDPS) is the set of relevant operator evaluation preferences that led to the selection of an operator in a sub-goal. Whenever Soar creates either a justification or a chunk, it will backtrace through two things for each working memory element that matches a condition of the instantiation: (1) the rule that produced the preference that created the working memory element and (2) the rules that produced the preferences in the CDPS for the selected operator that was tested by the rule in (1). By backtracing through that additional set of preferences, an agent can produce more specific chunks that incorporate the goal-attainment knowledge encoded in the operator evaluation rules.

All necessity preferences, i.e. prohibit and require preferences, are always included in the CDPS since they inherently encode the correctness of whether an operator is applicable in a problem space. In contrast, desirability preferences (rejects, betters, worses, bests, worsts and indifferents) are included depending on the role they play in the selection of the operator (and whether the add-desirability-prefs learn setting is active).

How Soar determines which of those preferences to include in the CDPS is based on the preference semantics it uses to choose an operator. During the decision phase, operator preferences are evaluated in a sequence of seven steps or filters, in an effort to select a single operator. Each step handles a specific type of preference. As the preference semantics are applied at each step to filter the candidates to a potential selected operator, the CDPS is built based on the preferences that were instrumental in applying that particular filter.

The following outline describes the logic that happens at each step. For a more detailed description of the various filters (but not the CDPS) see Appendix D on page 223. Note that impasses can occur at some of these stages, in which case, no operator is selected and the CDPS is emptied. Moreover, if the candidate set is reduced to zero or one, the decision process will exit with a finalized CDPS. For simplicity’s sake, this explanation assumes that there are no impasses and the decision process continues.

• Require Filter: ⋆ If an operator is selected based on a require preference, that preference is added to the CDPS. The logic behind this step is straightforward, the require preference directly resulted in the selection of the operator.

• Prohibit/Reject Filters: ⋆ If there exists at least one prohibit or reject preference, all prohibit and reject preferences for the eliminated candidates and all acceptable preferences for the surviving candidates are added to the CDPS. The logic behind this stage is that the conditions that led to the exclusion of the prohibited and rejected candidates is what allowed the final operator to be selected from among that particular set of surviving candidates.

• Better/Worse Filter: ⋆ For every candidate that is worse than at least one other candidate, remove any acceptable preferences for it from the CDPS that may have been added in the prohibit/reject filter, the logic being that their original inclusion was moot since they could have never been selected from the candidate set due to their eventual removal at this stage. ⋆ For every other candidate, add all better/worse preferences involving the candi- date.

• Best Filter: ⋆ Add any best preferences for remaining candidates to the CDPS.

• Worst Filter: ⋆ If any remaining candidate has a worst preference – this leads to that candidate being removed from consideration – that worst preference is added to the CDPS. Again, the logic is that the conditions that led to the exclusion of that candidate allowed the final operator to be chosen.

• Indifferent Filter: ⋆ This is the final stage, so the operator is now selected based on the agent’s ex- ploration policy. How indifferent preferences are added to the CDPS depends on whether any numeric indifferent preferences exist. If there exists at least one numeric indifferent preference, then every numeric preferences for the winning candidate is added to the CDPS. There may be multiple such preferences. Moreover, all binary indifferent preferences between that winning candidate and candidates without a numeric preference are added. If all indifferent preferences are non-numeric, then any unary indifferent pref- erences for the winning candidate are added to the CDPS. Moreover, all binary indifferent preferences between that winning candidate and other can- didates are added. ⋆ The logic behind adding binary indifferent preferences between the selected op- erator and the other final candidates is that those binary indifferent preferences prevented a tie impasse and allowed the final candidate to be chosen by the ex- ploration policy from among those mutually indifferent preferences. Note that there may be cases where two or more rules may create the same type of preference for a particular candidate. In those cases, only the first preference encountered is added to the CDPS. Adding all of them may produce over-specific chunks that may never apply to future situations. It may still be possible to learn similar chunks with those other preferences if it sub-goals again in a similar context. As of version 9.3.3, desirability preferences will not be added to the CDPS by default. The setting must be turned on via the learn command’s add-desirability-prefs setting. See Section 8.4 on page 164 for more information. Necessity preferences will always be added to the CDPS regardless of setting. Note that the CDPS also affects the conditions of justifications, so the add-desirability-prefs setting does have an effect on the agent even if learning is turned off.

marinier commented 11 years ago

Notes for this change have been written up (as soon as soartech's stuff gets pushed to here again, they will be in jsoar-core/cdps port notes.txt). It's not nearly as bad as it looks; most of the changes are irrelevant whitespace changes and trivial typo fixes in comments.

marinier commented 11 years ago

This change has been made on SoarTech's clone. Just waiting for final review and push to github to mark as complete.

soartech / jsoar

Port changes from CSoar revision 13602 #84