Open leolellisr opened 2 months ago
Your operator proposal rule marta*propose*init-rl
tests that there is no name
(meaning, it will unmatch when name
is added to the state), but your apply rule marta*apply*init-rl
does not create a name
on the state. Thus your proposal rule never unmatches and you get stuck in an operator no-change impasse.
Additionally, you should not be creating a reward-link
on the state in your apply rule. The reward-link
already exists and is created by Soar.
You could try changing your apply rule like this:
sp {marta*apply*init-rl
(state <s> ^operator.name init-rl
^reward-link <rl>)
-->
(<s> ^action A0
^value 1
^name myName)
(<rl> ^reward <rw>)
(<rw> ^value 1)
}
When I run your program now it applies the rule once and then state no-changes, which is expected since you have no other operators. It also gives the warning Ignoring rl*marta*rule*template*2 because it is a duplicate of rl*marta*rule*template*1
-- this is normal because the template rule tries to generate the same rule twice.
However, I think these rules are still not what you want. Because the value
doesn't exist on the state until after the apply rule fires, marta*rule*template
doesn't fire until after the operator is selected. But it is a preference rule and is supposed to influence whether the operator is selected or not.
Note you will be able to get a lot more help from people who have used Soar's RL much more recently than me if you email the Soar help mailing list. See how here: https://soar.eecs.umich.edu/SoarSupport/MailingLists
@marinier thank you for the reply.
I think I understand why it was stuck before, as I didn't have a name for the state in apply*init-rl, it went back to propose, right?
But I think I still don't understand very well how rule templates work.
Giving a little more context, I have an agent with 20 actions. I would like, for each action, to create a rule template that checks the current state data and then proposes the RL operator. Could you shed some light on this?
I request to enter the mailing lists. When I when i get approved, I will also send an email to the suggested list. Thank you very much!
My current code:
rl --set learning on # enable RL
indifferent-selection -g # use epsilon-greedy decision making
indifferent-selection --epsilon 0.1 # 10% deviation from greedy
# init
sp {marta*propose*init-rl
(state <s> ^superstate nil
-^name)
-->
(<s> ^operator <o> + )
(<o> ^name init-rl)
}
sp {marta*apply*init-rl
(state <s> ^operator.name init-rl
^reward-link <rl>)
-->
(<s> ^action A0
^value 1
^name action0)
(<rl> ^reward <rw>)
(<rw> ^value 1)
}
# rule template
sp {marta*rule*template
:template
(state <s> ^operator <o> +
^reward-link <rl>)
(<rl> ^reward <rw>)
(<rw> ^value <v>)
-->
(<s> ^operator <o> = <v>)
}
Hello everyone, everything good?
Can you help me?
I'm trying to implement reinforcement learning and rule templates with jsoar.
However, I am unable to get past initialization.
Could you check my code if I'm doing something wrong?
Is there any example of using rule templates?
Thank you for your attention!
Code: