opencog / rocca

Rational OpenCog Controlled Agent (ROCCA). Use OpenCog to control a rational agent in OpenAI Gym and Malmo environments.
GNU Affero General Public License v3.0
30 stars 18 forks source link

Support behavior tree instead mere action sequence as plan #30

Open ngeiswei opened 2 years ago

ngeiswei commented 2 years ago

Overview

Plans covering more possibilities are more likely to have high probabilities of success, and therefore be better guides for action selection. Using behavior trees instead of mere action sequences could be a way to go.

Rational

Let the following world be

  1. In context C₁ 1.1. action A₁ equi-probabilistically leads to context C₁ or C₂. 1.2. action A₄ leads to goal G with probability 0.6. 1.3. other actions lead to ¬G.
  2. In context C₂, action A₂ leads to G, other actions leads to ¬G.
  3. In context C₃, action A₃ leads to G, other actions leads to ¬G.

Assuming the agent is limited to action sequence plans, from context C₁, only the following three plans can reach G (≺ stands for SequentialAnd, and ↝ stands for PredictiveImplication):

  1. Plan P₁ "if in C₁ take A₁, then take A₂"
    (C₁∧A₁)≺A₂↝G

    has a 0.5 probability of success because A₁ has a 0.5 probability of leading to C₃ where A₂ will be ineffective.

  2. Plan P₂ "if in C₁ take A₁, then take A₃"
    (C₁∧A₁)≺A₃↝G

    has a 0.5 probability of success because A₁ has a 0.5 probability of leading to C₂ where A₃ will be ineffective.

  3. Plan P₃ "if in C₁ take A₄"
    C₁∧A₄↝G

    has a 0.6 probability of success.

Assuming max confidence over these probabilities, the action selector is gonna choose plan P₃, as it has the highest probability of success, while the optimal behavior would be to execute A₁, then depending on the context execute A₂ or A₃.

One way to open the mind of the agent is to allow plans covering these contextual branches. For instance the following plan "if in C₁ take A₁, then if in C₂ take A₂, else if in C₃ take A₃"

(C₁∧A₁)≺((C₂∧A₂)∨(C₃∧A₃))↝G

has a probability of success of 1, and therefore would lead to selecting A₁, instead of A₄ as above, as the best next action.

Existing work

Note that behavior tree is apprently already somewhat supported in OpenCog, see https://wiki.opencog.org/w/Behavior_tree_(2015_Archive). It is unknown as of right now how much reusable that work would be for ROCCA, or even if behavior tree, strictly defined, is the way to go, but if not, it will certainly be something alike.

Alternative

Another way to handle that is to include planning itself in the action space, then the following plan "if in C₁ take A₁, then plan and run the selected action from there"

(C₁∧A₁)≺PLAN_SELECT_RUN↝G

should in principle have a probability 1 of success. However it seems more difficult to reason about planning involving planning, even in such a linear fashion, than to reason about planning involving behavior tree with more primitive actions.