Closed GMMDMDIDEMS closed 7 months ago
Hi, before diving into the specific issue you described above, can you please clarify what you are trying to model in this domain? When I look at the CPF of robot-at'(?x, ?y)
, there is some logic that I cannot comprehend.
For example,
else if (( exists_{?x2 : xpos, ?y2 : ypos} [ GOAL(?x2,?y2) ^ robot-at(?x2,?y2) ] )
| ( move-north ^ exists_{?y2 : ypos} [ NORTH(?y,?y2) ^ robot-at(?x,?y) ] )
| ( move-south ^ exists_{?y2 : ypos} [ SOUTH(?y,?y2) ^ robot-at(?x,?y) ] )
| ( move-east ^ exists_{?x2 : xpos} [ EAST(?x,?x2) ^ robot-at(?x,?y) ] )
| ( move-west ^ exists_{?x2 : xpos} [ WEST(?x,?x2) ^ robot-at(?x,?y) ] ))
then
KronDelta(false)
this condition says that if the agent was at (?x, ?y)
, took one of the move
actions, and there existed a location towards the direction, robot-at(?x, ?y)
will be false
in the next time step, as the agent would have moved. So, this is good.
Then in the next condition (coming here implies that either the robot is not at (?x, ?y)
or it doesn't take a move action):
else if (( move-north ^ exists_{?y2 : ypos} [ NORTH(?y2,y?) ^ robot-at(?x,?y2) ] )
| ( move-south ^ exists_{?y2 : ypos} [ SOUTH(?y2,?y) ^ robot-at(?x,?y2) ] )
| ( move-east ^ exists_{?x2 : xpos} [ EAST(?x2,?x) ^ robot-at(?x2,?y) ] )
| ( move-west ^ exists_{?x2 : xpos} [ WEST(?x2,?x) ^ robot-at(?x2,?y) ] ))
then
Bernoulli( 1.0 - P(?x, ?y) )
the robot takes a move action at a position other than (?x, ?y)
and you are checking whether the agent will land on (?x, ?y)
. For example, the robot may be at (?x, ?y)
after taking move-north
from (?x, ?y2)
with NORTH(?y2, ?y)=true
with some probability defined by P(?x, ?y)
. If the Bernoulli rv was sampled to be true
, though, this does not "turn off" the original (?x, ?y2)
where the robot is located in the current time step, so there will be two locations where robot-at
is being true in the next time step. Is this intended?
This is the IPPC 2011 Navigation domain:
For all cells, the transition logic to determine whether robot-at'(?x,?y) in the if-then-else should state (1) The robot is at the cell and it's a goal so it stays (true) (2) The robot moved out of the cell so it is false (3) The robot moved into the cell, in which case we do a coin flip which allows the robot to "disappear" or successfully move
this does not "turn off" the original (?x, ?y2) where the robot is located in the current time step ... I don't think this is possible. By default the robot must always move out of the cell when a move is possible, i.e., rule (2), but whether it makes it to the next cell is stochastic... it either makes it or it "disappears", from which it cannot recover (so we say Navigation has dead-ends). (4) Else no change
So I don't think it's possible for robot-at'(?x,?y) to be true in two cells so long as the initial state had the robot in exactly one cell?
Note: Trying to understand this domain model elucidates a key case where RDDL is hard to read. These sorts of graph-based navigation domains are where PDDL action-centric effect-based syntax is much more clear than RDDL's state fluent-centric DBN syntax. However, the RDDL's DBN syntax is far more expressive than PDDL for modeling stochastic domains and I never came up with a hybrid language that would get us the syntactic and expressive best of both worlds. I'm open to suggestions on this -- it's not for this issue, but feel free to email me (Scott Sanner) separately.
On Tue, Feb 13, 2024 at 2:06 PM jihwan-jeong @.***> wrote:
Hi, before diving into the specific issue you described above, can you please clarify what you are trying to model in this domain? When I look at the CPF of robot-at'(?x, ?y), there is some logic that I cannot comprehend.
For example,
else if (( exists{?x2 : xpos, ?y2 : ypos} [ GOAL(?x2,?y2) ^ robot-at(?x2,?y2) ] ) | ( move-north ^ exists{?y2 : ypos} [ NORTH(?y,?y2) ^ robot-at(?x,?y) ] ) | ( move-south ^ exists{?y2 : ypos} [ SOUTH(?y,?y2) ^ robot-at(?x,?y) ] ) | ( move-east ^ exists{?x2 : xpos} [ EAST(?x,?x2) ^ robot-at(?x,?y) ] ) | ( move-west ^ exists_{?x2 : xpos} [ WEST(?x,?x2) ^ robot-at(?x,?y) ] )) then KronDelta(false)
this condition says that if the agent was at (?x, ?y), took one of the move actions, and there existed a location towards the direction, robot-at(?x, ?y) will be false in the next time step, as the agent would have moved. So, this is good.
Then in the next condition (coming here implies that either the robot is not at (?x, ?y) or it doesn't take a move action):
else if (( move-north ^ exists{?y2 : ypos} [ NORTH(?y2,y?) ^ robot-at(?x,?y2) ] ) | ( move-south ^ exists{?y2 : ypos} [ SOUTH(?y2,?y) ^ robot-at(?x,?y2) ] ) | ( move-east ^ exists{?x2 : xpos} [ EAST(?x2,?x) ^ robot-at(?x2,?y) ] ) | ( move-west ^ exists{?x2 : xpos} [ WEST(?x2,?x) ^ robot-at(?x2,?y) ] )) then Bernoulli( 1.0 - P(?x, ?y) )
the robot takes a move action at a position other than (?x, ?y) and you are checking whether the agent will land on (?x, ?y). For example, the robot may be at (?x, ?y) after taking move-north from (?x, ?y2) with NORTH(?y2, ?y)=true with some probability defined by P(?x, ?y). If the Bernoulli rv was sampled to be true, though, this does not "turn off" the original (?x, ?y2) where the robot is located in the current time step, so there will be two locations where robot-at is being true in the next time step. Is this intended?
— Reply to this email directly, view it on GitHub https://github.com/pyrddlgym-project/pyRDDLGym-symbolic/issues/2#issuecomment-1942205510, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRA3JFRLF5QEEPDI5A5V63YTO2T3AVCNFSM6AAAAABDGKD2K6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBSGIYDKNJRGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Quick note: I ran simulation with the domain file and the instance using the simulator from pyRDDLGym (not the one using the XADD moel). In fact, the robot disappears (instead of robot-at being true at two cells). Here's the print out:
step = 0
state =
{ 'robot-at___x1__y1': False,
'robot-at___x1__y2': False,
'robot-at___x1__y3': False,
'robot-at___x1__y4': False,
'robot-at___x1__y5': False,
'robot-at___x1__y6': False,
'robot-at___x2__y1': False,
'robot-at___x2__y2': False,
'robot-at___x2__y3': False,
'robot-at___x2__y4': False,
'robot-at___x2__y5': False,
'robot-at___x2__y6': False,
'robot-at___x3__y1': False,
'robot-at___x3__y2': False,
'robot-at___x3__y3': False,
'robot-at___x3__y4': False,
'robot-at___x3__y5': False,
'robot-at___x3__y6': False,
'robot-at___x4__y1': True,
'robot-at___x4__y2': False,
'robot-at___x4__y3': False,
'robot-at___x4__y4': False,
'robot-at___x4__y5': False,
'robot-at___x4__y6': False}
action =
{'move-north': 0}
next state =
{ 'robot-at___x1__y1': False,
'robot-at___x1__y2': False,
'robot-at___x1__y3': False,
'robot-at___x1__y4': False,
'robot-at___x1__y5': False,
'robot-at___x1__y6': False,
'robot-at___x2__y1': False,
'robot-at___x2__y2': False,
'robot-at___x2__y3': False,
'robot-at___x2__y4': False,
'robot-at___x2__y5': False,
'robot-at___x2__y6': False,
'robot-at___x3__y1': False,
'robot-at___x3__y2': False,
'robot-at___x3__y3': False,
'robot-at___x3__y4': False,
'robot-at___x3__y5': False,
'robot-at___x3__y6': False,
'robot-at___x4__y1': True,
'robot-at___x4__y2': False,
'robot-at___x4__y3': False,
'robot-at___x4__y4': False,
'robot-at___x4__y5': False,
'robot-at___x4__y6': False}
reward = -1.0
step = 1
state =
{ 'robot-at___x1__y1': False,
'robot-at___x1__y2': False,
'robot-at___x1__y3': False,
'robot-at___x1__y4': False,
'robot-at___x1__y5': False,
'robot-at___x1__y6': False,
'robot-at___x2__y1': False,
'robot-at___x2__y2': False,
'robot-at___x2__y3': False,
'robot-at___x2__y4': False,
'robot-at___x2__y5': False,
'robot-at___x2__y6': False,
'robot-at___x3__y1': False,
'robot-at___x3__y2': False,
'robot-at___x3__y3': False,
'robot-at___x3__y4': False,
'robot-at___x3__y5': False,
'robot-at___x3__y6': False,
'robot-at___x4__y1': True,
'robot-at___x4__y2': False,
'robot-at___x4__y3': False,
'robot-at___x4__y4': False,
'robot-at___x4__y5': False,
'robot-at___x4__y6': False}
action =
{'move-north': 1}
next state =
{ 'robot-at___x1__y1': False,
'robot-at___x1__y2': False,
'robot-at___x1__y3': False,
'robot-at___x1__y4': False,
'robot-at___x1__y5': False,
'robot-at___x1__y6': False,
'robot-at___x2__y1': False,
'robot-at___x2__y2': False,
'robot-at___x2__y3': False,
'robot-at___x2__y4': False,
'robot-at___x2__y5': False,
'robot-at___x2__y6': False,
'robot-at___x3__y1': False,
'robot-at___x3__y2': False,
'robot-at___x3__y3': False,
'robot-at___x3__y4': False,
'robot-at___x3__y5': False,
'robot-at___x3__y6': False,
'robot-at___x4__y1': False,
'robot-at___x4__y2': False,
'robot-at___x4__y3': False,
'robot-at___x4__y4': False,
'robot-at___x4__y5': False,
'robot-at___x4__y6': False}
reward = -1.0
step = 2
state =
{ 'robot-at___x1__y1': False,
'robot-at___x1__y2': False,
'robot-at___x1__y3': False,
'robot-at___x1__y4': False,
'robot-at___x1__y5': False,
'robot-at___x1__y6': False,
'robot-at___x2__y1': False,
'robot-at___x2__y2': False,
'robot-at___x2__y3': False,
'robot-at___x2__y4': False,
'robot-at___x2__y5': False,
'robot-at___x2__y6': False,
'robot-at___x3__y1': False,
'robot-at___x3__y2': False,
'robot-at___x3__y3': False,
'robot-at___x3__y4': False,
'robot-at___x3__y5': False,
'robot-at___x3__y6': False,
'robot-at___x4__y1': False,
'robot-at___x4__y2': False,
'robot-at___x4__y3': False,
'robot-at___x4__y4': False,
'robot-at___x4__y5': False,
'robot-at___x4__y6': False}
action =
{'move-east': 0}
next state =
{ 'robot-at___x1__y1': False,
'robot-at___x1__y2': False,
'robot-at___x1__y3': False,
'robot-at___x1__y4': False,
'robot-at___x1__y5': False,
'robot-at___x1__y6': False,
'robot-at___x2__y1': False,
'robot-at___x2__y2': False,
'robot-at___x2__y3': False,
'robot-at___x2__y4': False,
'robot-at___x2__y5': False,
'robot-at___x2__y6': False,
'robot-at___x3__y1': False,
'robot-at___x3__y2': False,
'robot-at___x3__y3': False,
'robot-at___x3__y4': False,
'robot-at___x3__y5': False,
'robot-at___x3__y6': False,
'robot-at___x4__y1': False,
'robot-at___x4__y2': False,
'robot-at___x4__y3': False,
'robot-at___x4__y4': False,
'robot-at___x4__y5': False,
'robot-at___x4__y6': False}
reward = -1.0
Right after taking the move-north action in step 1, the states become False for all locations from step 2, and it never recovers after that.
So, there must be some issues with the domain description?
Adding the vis FYI:
All: sorry, some confusion on my end. Some quick notes to resolve this thread:
From the original post:
Indeed, I thought the DBN had an error initially, but now I see it is correct in both cases: ... robot-at(x1,y1) it is made false if a robot moves out of it (say north or east) or true if a robot moves into it (south or west) so it depends on all 4 actions. ... robot-at(x4,y6) is made true by the 2 actions that move into it, but it is a goal state that you cannot leave (as defined in the cpf) so it is unaffected by the other two directions that could move out of it.
A dynamic Bayesian network (DBN) graph only tells you what the state and action fluent dependencies of a next state fluent. If you want more information about the functional dependency, you need to look at the XADD compilation of the conditional probability function (cpf) itself, which Jihwan nicely provides the ability to extract and visualize;
https://pyrddlgym.readthedocs.io/en/latest/xadd.html#xadd-compilation-of-cpfs
For the discussion of the transition:
Right after taking the move-north action in step 1, the states become False for all locations from step 2, and it never recovers after that. So, there must be some issues with the domain description?
I believe the domain is working exactly as described in the comments at the beginning of the domain description... the ability to disappear (which causes dead-ends) is the purpose of the domain (it introduces dead-ends) and allows evaluation of the ability of the planner to reason about the probability of success of different paths (and dead-ends), which is known to cause failures for some determinizing planners.
On Tue, Feb 13, 2024 at 6:56 PM jihwan-jeong @.***> wrote:
Quick note: I ran simulation with the domain file and the instance, and in fact, the robot disappears (instead of robot-at being true at two cells). Here's the print out:
step = 0 state = { 'robot-at_x1_y1': False, 'robot-atx1y2': False, 'robot-atx1y3': False, 'robot-atx1y4': False, 'robot-atx1y5': False, 'robot-atx1y6': False, 'robot-atx2y1': False, 'robot-atx2y2': False, 'robot-atx2y3': False, 'robot-atx2y4': False, 'robot-atx2y5': False, 'robot-atx2y6': False, 'robot-atx3y1': False, 'robot-atx3y2': False, 'robot-atx3y3': False, 'robot-atx3y4': False, 'robot-atx3y5': False, 'robot-atx3y6': False, 'robot-atx4y1': True, 'robot-atx4y2': False, 'robot-atx4y3': False, 'robot-atx4y4': False, 'robot-atx4y5': False, 'robot-atx4y6': False} action = {'move-north': 0} next state = { 'robot-atx1y1': False, 'robot-atx1y2': False, 'robot-atx1y3': False, 'robot-atx1y4': False, 'robot-atx1y5': False, 'robot-atx1y6': False, 'robot-atx2y1': False, 'robot-atx2y2': False, 'robot-atx2y3': False, 'robot-atx2y4': False, 'robot-atx2y5': False, 'robot-atx2y6': False, 'robot-atx3y1': False, 'robot-atx3y2': False, 'robot-atx3y3': False, 'robot-atx3y4': False, 'robot-atx3y5': False, 'robot-atx3y6': False, 'robot-atx4y1': True, 'robot-atx4y2': False, 'robot-atx4y3': False, 'robot-atx4y4': False, 'robot-atx4y5': False, 'robot-atx4__y6': False} reward = -1.0
step = 1 state = { 'robot-at_x1_y1': False, 'robot-atx1y2': False, 'robot-atx1y3': False, 'robot-atx1y4': False, 'robot-atx1y5': False, 'robot-atx1y6': False, 'robot-atx2y1': False, 'robot-atx2y2': False, 'robot-atx2y3': False, 'robot-atx2y4': False, 'robot-atx2y5': False, 'robot-atx2y6': False, 'robot-atx3y1': False, 'robot-atx3y2': False, 'robot-atx3y3': False, 'robot-atx3y4': False, 'robot-atx3y5': False, 'robot-atx3y6': False, 'robot-atx4y1': True, 'robot-atx4y2': False, 'robot-atx4y3': False, 'robot-atx4y4': False, 'robot-atx4y5': False, 'robot-atx4y6': False} action = {'move-north': 1} next state = { 'robot-atx1y1': False, 'robot-atx1y2': False, 'robot-atx1y3': False, 'robot-atx1y4': False, 'robot-atx1y5': False, 'robot-atx1y6': False, 'robot-atx2y1': False, 'robot-atx2y2': False, 'robot-atx2y3': False, 'robot-atx2y4': False, 'robot-atx2y5': False, 'robot-atx2y6': False, 'robot-atx3y1': False, 'robot-atx3y2': False, 'robot-atx3y3': False, 'robot-atx3y4': False, 'robot-atx3y5': False, 'robot-atx3y6': False, 'robot-atx4y1': False, 'robot-atx4y2': False, 'robot-atx4y3': False, 'robot-atx4y4': False, 'robot-atx4y5': False, 'robot-atx4__y6': False} reward = -1.0
step = 2 state = { 'robot-at_x1_y1': False, 'robot-atx1y2': False, 'robot-atx1y3': False, 'robot-atx1y4': False, 'robot-atx1y5': False, 'robot-atx1y6': False, 'robot-atx2y1': False, 'robot-atx2y2': False, 'robot-atx2y3': False, 'robot-atx2y4': False, 'robot-atx2y5': False, 'robot-atx2y6': False, 'robot-atx3y1': False, 'robot-atx3y2': False, 'robot-atx3y3': False, 'robot-atx3y4': False, 'robot-atx3y5': False, 'robot-atx3y6': False, 'robot-atx4y1': False, 'robot-atx4y2': False, 'robot-atx4y3': False, 'robot-atx4y4': False, 'robot-atx4y5': False, 'robot-atx4y6': False} action = {'move-east': 0} next state = { 'robot-atx1y1': False, 'robot-atx1y2': False, 'robot-atx1y3': False, 'robot-atx1y4': False, 'robot-atx1y5': False, 'robot-atx1y6': False, 'robot-atx2y1': False, 'robot-atx2y2': False, 'robot-atx2y3': False, 'robot-atx2y4': False, 'robot-atx2y5': False, 'robot-atx2y6': False, 'robot-atx3y1': False, 'robot-atx3y2': False, 'robot-atx3y3': False, 'robot-atx3y4': False, 'robot-atx3y5': False, 'robot-atx3y6': False, 'robot-atx4y1': False, 'robot-atx4y2': False, 'robot-atx4y3': False, 'robot-atx4y4': False, 'robot-atx4y5': False, 'robot-atx4__y6': False} reward = -1.0
Right after taking the move-north action in step 1, the states become False for all locations from step 2, and it never recovers after that.
So, there must be some issues with the domain description?
— Reply to this email directly, view it on GitHub https://github.com/pyrddlgym-project/pyRDDLGym-symbolic/issues/2#issuecomment-1942872077, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRA3JECS424BAM5DDAMLYTYTP4T7AVCNFSM6AAAAABDGKD2K6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBSHA3TEMBXG4 . You are receiving this because you commented.Message ID: @.***>
@Scott Oh, you are right. I had not paid attention to the doc and was just looking at the CPF, my bad.
Anyhow, I agree that the DBN simplification is correct, and so I'll close this issue. Feel free to reopen the issue @GMMDMDIDEMS if you have further questions.
@ssanner @jihwan-jeong thank you very much for the detailed explanation.
I additionally had a look at the XADD compilation for the robot-at(x1, y1)
variable. For some paths, I have difficulties assigning the path to the respective CPF section. Especially when the SF is "overwritten", e.g. the left red path: robot-at(x1, y2)
is true and robot-at(x2, y1)
is true and action move-south
leads to the Bernoulli probability distribution. This would imply that we are in step
(3) The robot moved into the cell, in which case we do a coin flip
However, this would only be the case if we choose the action move-south
from state robot-at(x1, y2)
.
The same for the yellow paths: robot-at(x1, y1)
is true, and depending on the path also robot-at(x1, y2)
and robot-at(x2, y1)
. We take no action and end up at [1]
. Why do we end up at [1]
when we are in another state? How is it possible that multiple states can be true?
How is it possible that multiple states can be true?
This is a great question!
First, it's important to separate your intuitive notion of the domain from what is explicitly coded in RDDL.
Observe that for Navigation that it is never specified that a robot cannot be at two locations:
There is a "state-action-constraints" section (now deprecated and replaced with a "state-invariants" section in modern RDDL) at the bottom of the Navigation domain that specifies this, but it is commented out.
However, even if these constraints were added to a "state-invariants" section though, the current XADD compilation does not simplify w.r.t. "state-invariants".
This means that the XADD compiler is not aware that some state configurations are not intended to be possible (i.e., that a robot should not be in two locations). Hence, the XADD compiler must consider all possible state configurations (including where more than one cell has a robot). So the paths you highlight you're seeing are the reasoning about the effects in the case of multiple robots.
Are these highlighted paths correct? I'm not sure since I only wrote the cpfs with the single robot case in mind, but whether these paths are correct does not actually matter for the intended single robot configuration. By definition, you will never evaluate a path in the XADD that corresponds to an unreachable state. And for Navigation, you can manually verify that if you start in an initial state with only one robot then at any time in the future, you will only have <= 1 robot.
So how to interpret these XADDs? Quite simply, ignore any path that violates a state-invariant that will hold during simulation... you'll never evaluate that XADD path in any state.
NOTE: For discrete domains, you may want to set reparam=False when you create RDDLModelXADD(...) to get rid of the reparameterization with UNIFORM random variables and instead show the leaves directly as the probabilities. Reparameterization is useful for simulation (you can pre-sample all canonical random variables such as the UNIFORM and then just read off the next state deterministically from the XADD) but harder to read.
Going forward, I think it would be useful for the XADD compiler to simplify w.r.t. state-invariants and I'll add this as an open issue.
On Sat, Feb 17, 2024 at 9:21 AM GMMDMDIDEMS @.***> wrote:
@ssanner https://github.com/ssanner @jihwan-jeong https://github.com/jihwan-jeong thank you very much for the detailed explanation.
I additionally had a look at the XADD compilation for the robot-at(x1, y1) variable. For some paths, I have difficulties assigning the path to the respective CPF section. Especially when the SF is "overwritten", e.g. the left red path: robot-at(x1, y2) is true and robot-at(x2, y1) is true and action move-south leads to the Bernoulli probability distribution. This would imply that we are in step
(3) The robot moved into the cell, in which case we do a coin flip
However, this would only be the case if we choose the action move-south from state robot-at(x1, y2).
navigation_xadd_x1y1.png (view on web) https://github.com/pyrddlgym-project/pyRDDLGym-symbolic/assets/55764668/afa3eaac-a886-4a21-a6e5-6ac4df63af7f
The same for the yellow paths: robot-at(x1, y1) is true, and depending on the path also robot-at(x1, y2) and robot-at(x2, y1). We take no action and end up at [1]. Why do we end up at [1] when we are in another state? How is it possible that multiple states can be true?
— Reply to this email directly, view it on GitHub https://github.com/pyrddlgym-project/pyRDDLGym-symbolic/issues/2#issuecomment-1950222120, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRA3JETA4UYERHNYUPVGDLYUC4FPAVCNFSM6AAAAABDGKD2K6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJQGIZDEMJSGA . You are receiving this because you were mentioned.Message ID: @.***>
Thanks again for the great explanation. That makes sense.
NOTE: For discrete domains, you may want to set reparam=False when you create RDDLModelXADD(...) to get rid of the reparameterization with UNIFORM random variables and instead show the leaves directly as the probabilities. Reparameterization is useful for simulation (you can pre-sample all canonical random variables such as the UNIFORM and then just read off the next state deterministically from the XADD) but harder to read.
Currently only Bernoulli can be used without reparameterisation. Since the navigation domain uses multiple probability distributions, it is unfortunately not possible to set self.reparam=False
.
My general goal is to extract the dependency between the current state
, the action
we perform in the current state
and next state
it leads us to. Likewise, the dependency between two states when we do not perform an action. With https://github.com/pyrddlgym-project/pyRDDLGym-symbolic/issues/3, I could parse the simplified XADD tree structure and extract all dependencies (current state
+ action
-> next state
, state
-> next state
) that will set a next state
to True. However, it would require to add state-invariants
to every domain.rddl
file.
With rddlsim
you could extract that information from the parsed instance (For the mentioned Navigation instance.rddl
file we got):
#####TASK##### Here
## name
navigation_inst_mdp__700
## horizon
40
## discount factor
1
## number of action fluents
4
## number of det state fluents
8
## number of prob state fluents
16
## number of preconds
0
## number of actions
5
## number of hashing functions
25
## initial state
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## 1 if task is deterministic
0
## 1 if state hashing possible
1
## 1 if kleene state hashing possible
1
#####ACTION FLUENTS#####
## index
0
## name
move-east
## number of values
2
## values
0 false
1 true
## index
1
## name
move-north
## number of values
2
## values
0 false
1 true
## index
2
## name
move-south
## number of values
2
## values
0 false
1 true
## index
3
## name
move-west
## number of values
2
## values
0 false
1 true
#####DET STATE FLUENTS AND CPFS#####
## index
0
## name
robot-at(x1, y1)
## number of values
2
## values
0 false
1 true
## formula
switch( (or($s(7) and($a(1) $s(0)) and($a(0) $s(0))) : $c(0)) (or(and($a(2) $s(8)) and($a(3) $s(2))) : $c(1)) ($c(1) : $s(0)) )
## hash index
0
## caching type
NONE
## kleene caching type
NONE
## action hash keys
0 0
1 1
2 2
3 3
4 4
## index
1
## name
robot-at(x1, y6)
## number of values
2
## values
0 false
1 true
## formula
switch( (or($s(7) and($a(2) $s(1)) and($a(0) $s(1))) : $c(0)) (or(and($a(1) $s(11)) and($a(3) $s(3))) : $c(1)) ($c(1) : $s(1)) )
## hash index
1
## caching type
NONE
## kleene caching type
NONE
## action hash keys
0 0
1 1
2 2
3 3
4 4
For robot-at(x1, y1)
we can extract the information that robot-at(x1, y1)
is True for:
(1) robot is in s(1)
(2) robot is in s(11) and takes action a(1) or
(3) robot is in s(3) and takes action a(3)
Is there a way to extract this information with pyRDDLGym
?
unfortunately not possible to set self.reparam=False
It seems this may be due to the KronDelta (which is no longer required and can just be removed) so that should not prevent compilation with self.reparam=False. I've pinged @jihwan-jeong https://github.com/jihwan-jeong about fixing this minor issue.
However, it would require to add state-invariants to every domain.rddl file.
If you want the simplified version of the XADD under the state-invariants, then yes. This is literally just the constraint
[sum_{?x : xpos, ?y : ypos} robot-at(?x,?y)] <= 1;
But indeed, doing this also requires us to resolve #3 so it is not currently an option.
Is there a way to extract this information with pyRDDLGym?
I'm not sure what code in the Java rddlsim provided the output you are referencing... it's not code that I wrote so I am not familiar with it.
Since state-invariants will naturally hold in an instance, it does seem possible in theory to extract the information you are looking for from the instance, but we don't have any such extraction code in PyRDDLGym. I am slightly concerned though that this code is enumerating states and actions, which will only work for relatively small discrete domains (and especially just those with one-hot state and action fluents).
In general, if you want to extract cpf compilations under state-invariants then simplifying the XADD under the state-invariants in #3 is the right approach that will work efficiently for all of RDDL (large discrete and continuous domains). I'll check with @jihwan-jeong to see when he might have time to contribute this. (In principle, since we can already compile expressions into XADDs, this seems like only a few lines of code.)
On Sun, Feb 18, 2024 at 9:31 AM GMMDMDIDEMS @.***> wrote:
Thanks again for the great explanation. That makes sense.
NOTE: For discrete domains, you may want to set reparam=False when you create RDDLModelXADD(...) to get rid of the reparameterization with UNIFORM random variables and instead show the leaves directly as the probabilities. Reparameterization is useful for simulation (you can pre-sample all canonical random variables such as the UNIFORM and then just read off the next state deterministically from the XADD) but harder to read.
Currently only Bernoulli can be used without reparameterisation. Since the navigation domain uses multiple probability distributions, it is unfortunately not possible to set self.reparam=False.
My general goal is to extract the dependency between the current state, the action we perform in the current state and next state it leads us to. Likewise, the dependency between two states when we do not perform an action. With #3 https://github.com/pyrddlgym-project/pyRDDLGym-symbolic/issues/3, I could parse the simplified XADD tree structure and extract all dependencies (current state + action -> next state, state -> next state) that will set a next state to True. However, it would require to add state-invariants to every domain.rddl file.
With rddlsim you could extract that information from the parsed instance (For the mentioned Navigation instance.rddl file we got):
TASK##### Here
name
navigation_inst_mdp__700
horizon
40
discount factor
1
number of action fluents
4
number of det state fluents
8
number of prob state fluents
16
number of preconds
0
number of actions
5
number of hashing functions
25
initial state
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 if task is deterministic
0
1 if state hashing possible
1
1 if kleene state hashing possible
1
ACTION FLUENTS
index
0
name
move-east
number of values
2
values
0 false 1 true
index
1
name
move-north
number of values
2
values
0 false 1 true
index
2
name
move-south
number of values
2
values
0 false 1 true
index
3
name
move-west
number of values
2
values
0 false 1 true
DET STATE FLUENTS AND CPFS
index
0
name
robot-at(x1, y1)
number of values
2
values
0 false 1 true
formula
switch( (or($s(7) and($a(1) $s(0)) and($a(0) $s(0))) : $c(0)) (or(and($a(2) $s(8)) and($a(3) $s(2))) : $c(1)) ($c(1) : $s(0)) )
hash index
0
caching type
NONE
kleene caching type
NONE
action hash keys
0 0 1 1 2 2 3 3 4 4
index
1
name
robot-at(x1, y6)
number of values
2
values
0 false 1 true
formula
switch( (or($s(7) and($a(2) $s(1)) and($a(0) $s(1))) : $c(0)) (or(and($a(1) $s(11)) and($a(3) $s(3))) : $c(1)) ($c(1) : $s(1)) )
hash index
1
caching type
NONE
kleene caching type
NONE
action hash keys
0 0 1 1 2 2 3 3 4 4
For robot-at(x1, y1) we can extract the information that robot-at(x1, y1) is True for: (1) robot is in s(1) (2) robot is in s(11) and takes action a(1) or (3) robot is in s(3) and takes action a(3)
Is there a way to extract this information with pyRDDLGym?
— Reply to this email directly, view it on GitHub https://github.com/pyrddlgym-project/pyRDDLGym-symbolic/issues/2#issuecomment-1951343201, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRA3JGNKIYELVKRF6WPESTYUIGELAVCNFSM6AAAAABDGKD2K6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJRGM2DGMRQGE . You are receiving this because you were mentioned.Message ID: @.***>
Sorry for the confusion. The output of the parsed RDDL instance was from the PROST internal RDDL parser.
When I played around with the navigation domain (I appended the
domain.rddl
andinstance.rddl
) and displayed the DBN of single states and ground fluents, I could not interpret the results correctly:There are 24 different grounded SFs and the goal for the given instance is in
GOAL(x4,y6)
. When generating the DBN for the single fluentrobot-at
and the grounded fluentx1__y1
I get:When generating the DBN for the grounded fluent
x4__y6
I get:The DBN for the grounded fluent
x4__y6
looks correct to me. However, for the grounded fluentx1__y1
I don't understand yet, why the next-state (robot-at'(?x1,?y1)
) is dependent on the actionsmove-north
andmove-east
.Is there a way to find out the connection between actions and SF? Using the example of the DBN of the grounded fluent
x4__y6
, the next-state depends on whether robot-at(x4,y5) ^ move-north or robot-at(x3,y6) ^ move-east or the robot has already arrived at the goal. As graphical representation I mean sth. like:domain.rddl:
instance.rddl: