pyrddlgym-project / pyRDDLGym-symbolic

Symbolic compilation of RDDL domains, Dynamic Bayes net (DBN) visualization, symbolic dynamic programming (SDP).
https://pyrddlgym.readthedocs.io/en/latest/xadd.html
MIT License
2 stars 1 forks source link

Potential DBN inconsistencies #2

Closed GMMDMDIDEMS closed 7 months ago

GMMDMDIDEMS commented 7 months ago

When I played around with the navigation domain (I appended the domain.rddl and instance.rddl) and displayed the DBN of single states and ground fluents, I could not interpret the results correctly:

There are 24 different grounded SFs and the goal for the given instance is in GOAL(x4,y6). When generating the DBN for the single fluent robot-at and the grounded fluent x1__y1 I get: Wildfire_robot-at_x1__y1_inst_0

When generating the DBN for the grounded fluent x4__y6 I get: Wildfire_robot-at_x4__y6_inst_0

The DBN for the grounded fluent x4__y6 looks correct to me. However, for the grounded fluent x1__y1 I don't understand yet, why the next-state (robot-at'(?x1,?y1)) is dependent on the actions move-north and move-east.

Is there a way to find out the connection between actions and SF? Using the example of the DBN of the grounded fluent x4__y6, the next-state depends on whether robot-at(x4,y5) ^ move-north or robot-at(x3,y6) ^ move-east or the robot has already arrived at the goal. As graphical representation I mean sth. like:

robot-at(x4,y5) + move-north --> robot-at(x4,y6)
robot-at(x3,y6) + move-east --> robot-at(x4,y6)
robot-at(x4,y6) --> robot-at(x4,y6)

domain.rddl:

////////////////////////////////////////////////////////////////////
//
// Navigation MDP
//
// Author: Scott Sanner (ssanner [at] gmail.com)
//
// In a grid, a robot (R) must get to a goal (G).  Every cell offers
// the robot a (different) chance of disappearing.  The robot needs
// to choose a path which gets it to the goal most reliably within
// the finite horizon time.
//
// ***********************
// *  0   0   0   0   R  * 
// * .1  .3  .5  .7  .9  *
// * .1  .3  .5  .7  .9  * 
// * .1  .3  .5  .7  .9  *
// * .1  .3  .5  .7  .9  *
// *  0   0   0   0   G  * 
// ***********************
//
// This is a good domain to test deteminized planners because
// one can see here that the path using the .3 chance of failure
// leads to a 1-step most likely outcome of survival, but
// a poor 4-step change of survival (.7^(.4)) whereas the path
// using a .1 chance of failure is much more safe.
//
// The domain generators for navigation have a flag to produce slightly 
// obfuscated files to discourage hand-coded policies, but 
// rddl.viz.NavigationDisplay can properly display these grids, e.g.,
//
//  ./run rddl.sim.Simulator files/final_comp/rddl rddl.policy.RandomBoolPolicy 
//        navigation_inst_mdp__1 rddl.viz.NavigationDisplay
//
// (Movement was not made stochastic due to the lack of intermediate
//  variables and synchronic arcs to support both the PPDDL and SPUDD 
//  translations.)
// 
////////////////////////////////////////////////////////////////////

domain navigation_mdp {
    requirements = {
//      constrained-state,
        reward-deterministic
    };

    types {
        xpos : object;
        ypos : object;
    };

    pvariables {

        NORTH(ypos, ypos) : {non-fluent, bool, default = false};
        SOUTH(ypos, ypos) : {non-fluent, bool, default = false};
        EAST(xpos, xpos)  : {non-fluent, bool, default = false};
        WEST(xpos, xpos)  : {non-fluent, bool, default = false};

        MIN-XPOS(xpos) : {non-fluent, bool, default = false};
        MAX-XPOS(xpos) : {non-fluent, bool, default = false};
        MIN-YPOS(ypos) : {non-fluent, bool, default = false};
        MAX-YPOS(ypos) : {non-fluent, bool, default = false};

        P(xpos, ypos) : {non-fluent, real, default = 0.0};

        GOAL(xpos,ypos) : {non-fluent, bool, default = false};

        // Fluents
        robot-at(xpos, ypos) : {state-fluent, bool, default = false};

        // Actions
        move-north : {action-fluent, bool, default = false};
        move-south : {action-fluent, bool, default = false};
        move-east  : {action-fluent, bool, default = false};
        move-west  : {action-fluent, bool, default = false};
    };

    cpfs {

        robot-at'(?x,?y) =

            if ( GOAL(?x,?y) ^ robot-at(?x,?y)  )
            then 
                KronDelta(true)
            else if (( exists_{?x2 : xpos, ?y2 : ypos} [ GOAL(?x2,?y2) ^ robot-at(?x2,?y2)  ] )
                     | ( move-north ^ exists_{?y2 : ypos} [ NORTH(?y,?y2) ^ robot-at(?x,?y) ] )
                     | ( move-south ^ exists_{?y2 : ypos} [ SOUTH(?y,?y2) ^ robot-at(?x,?y) ] )
                     | ( move-east ^ exists_{?x2 : xpos} [ EAST(?x,?x2) ^ robot-at(?x,?y) ] )
                     | ( move-west ^ exists_{?x2 : xpos} [ WEST(?x,?x2) ^ robot-at(?x,?y) ] ))
            then 
                KronDelta(false) 
            else if (( move-north ^ exists_{?y2 : ypos} [ NORTH(?y2,?y) ^ robot-at(?x,?y2) ] )
                     | ( move-south ^ exists_{?y2 : ypos} [ SOUTH(?y2,?y) ^ robot-at(?x,?y2) ] )
                     | ( move-east ^ exists_{?x2 : xpos} [ EAST(?x2,?x) ^ robot-at(?x2,?y) ] )
                     | ( move-west ^ exists_{?x2 : xpos} [ WEST(?x2,?x) ^ robot-at(?x2,?y) ] ))
            then 
                Bernoulli( 1.0 - P(?x, ?y) ) 
            else 
                KronDelta( robot-at(?x,?y) );

    };

    // 0 reward for reaching goal, -1 in all other cases
    reward = [sum_{?x : xpos, ?y : ypos} -(GOAL(?x,?y) ^ ~robot-at(?x,?y))]; 

}

instance.rddl:

non-fluents nf_navigation_inst_mdp__0 {
    domain = navigation_mdp;
    objects {
        xpos : {x1,x2,x3,x4};
        ypos : {y1,y2,y3,y4,y5,y6};
    };
    non-fluents {
        NORTH(y1,y2);
        SOUTH(y2,y1);
        NORTH(y2,y3);
        SOUTH(y3,y2);
        NORTH(y3,y4);
        SOUTH(y4,y3);
        NORTH(y4,y5);
        SOUTH(y5,y4);
        NORTH(y5,y6);
        SOUTH(y6,y5);

        EAST(x1,x2);
        WEST(x2,x1);
        EAST(x2,x3);
        WEST(x3,x2);
        EAST(x3,x4);
        WEST(x4,x3);

        MIN-XPOS(x1);
        MAX-XPOS(x4);
        MIN-YPOS(y1);
        MAX-YPOS(y6);

        GOAL(x4,y6);

        P(x1,y2) = 0.05619587033111525;
        P(x1,y3) = 0.04836146267135677;
        P(x1,y4) = 0.04224611192322229;
        P(x1,y5) = 0.027581761688785046;
        P(x2,y2) = 0.3290676024238835;
        P(x2,y3) = 0.31491139267692797;
        P(x2,y4) = 0.3367653188138302;
        P(x2,y5) = 0.33334257087090663;
        P(x3,y2) = 0.6136325686764752;
        P(x3,y3) = 0.6107738967872153;
        P(x3,y4) = 0.6151765929039653;
        P(x3,y5) = 0.6120255439526258;
        P(x4,y2) = 0.9491586061795277;
        P(x4,y3) = 0.9140435651595797;
        P(x4,y4) = 0.9300722126485798;
        P(x4,y5) = 0.920351061739235;
    };
}

instance navigation_inst_mdp__0 {
    domain = navigation_mdp;
    non-fluents = nf_navigation_inst_mdp__0;
    init-state {
        robot-at(x4,y1);
    };
    max-nondef-actions = 1;
    horizon = 40;
    discount = 1.0;
}
jihwan-jeong commented 7 months ago

Hi, before diving into the specific issue you described above, can you please clarify what you are trying to model in this domain? When I look at the CPF of robot-at'(?x, ?y), there is some logic that I cannot comprehend.

For example,

else if (( exists_{?x2 : xpos, ?y2 : ypos} [ GOAL(?x2,?y2) ^ robot-at(?x2,?y2)  ] )
                     | ( move-north ^ exists_{?y2 : ypos} [ NORTH(?y,?y2) ^ robot-at(?x,?y) ] )
                     | ( move-south ^ exists_{?y2 : ypos} [ SOUTH(?y,?y2) ^ robot-at(?x,?y) ] )
                     | ( move-east ^ exists_{?x2 : xpos} [ EAST(?x,?x2) ^ robot-at(?x,?y) ] )
                     | ( move-west ^ exists_{?x2 : xpos} [ WEST(?x,?x2) ^ robot-at(?x,?y) ] ))
            then 
                KronDelta(false) 

this condition says that if the agent was at (?x, ?y), took one of the move actions, and there existed a location towards the direction, robot-at(?x, ?y) will be false in the next time step, as the agent would have moved. So, this is good.

Then in the next condition (coming here implies that either the robot is not at (?x, ?y) or it doesn't take a move action):

else if (( move-north ^ exists_{?y2 : ypos} [ NORTH(?y2,y?) ^ robot-at(?x,?y2) ] )
                     | ( move-south ^ exists_{?y2 : ypos} [ SOUTH(?y2,?y) ^ robot-at(?x,?y2) ] )
                     | ( move-east ^ exists_{?x2 : xpos} [ EAST(?x2,?x) ^ robot-at(?x2,?y) ] )
                     | ( move-west ^ exists_{?x2 : xpos} [ WEST(?x2,?x) ^ robot-at(?x2,?y) ] ))
            then 
                Bernoulli( 1.0 - P(?x, ?y) ) 

the robot takes a move action at a position other than (?x, ?y) and you are checking whether the agent will land on (?x, ?y). For example, the robot may be at (?x, ?y) after taking move-north from (?x, ?y2) with NORTH(?y2, ?y)=true with some probability defined by P(?x, ?y). If the Bernoulli rv was sampled to be true, though, this does not "turn off" the original (?x, ?y2) where the robot is located in the current time step, so there will be two locations where robot-at is being true in the next time step. Is this intended?

ssanner commented 7 months ago

This is the IPPC 2011 Navigation domain:

https://github.com/pyrddlgym-project/rddlrepository/blob/main/rddlrepository/archive/competitions/IPPC2011/Navigation/MDP/domain.rddl

For all cells, the transition logic to determine whether robot-at'(?x,?y) in the if-then-else should state (1) The robot is at the cell and it's a goal so it stays (true) (2) The robot moved out of the cell so it is false (3) The robot moved into the cell, in which case we do a coin flip which allows the robot to "disappear" or successfully move

this does not "turn off" the original (?x, ?y2) where the robot is located in the current time step ... I don't think this is possible. By default the robot must always move out of the cell when a move is possible, i.e., rule (2), but whether it makes it to the next cell is stochastic... it either makes it or it "disappears", from which it cannot recover (so we say Navigation has dead-ends). (4) Else no change

So I don't think it's possible for robot-at'(?x,?y) to be true in two cells so long as the initial state had the robot in exactly one cell?

Note: Trying to understand this domain model elucidates a key case where RDDL is hard to read. These sorts of graph-based navigation domains are where PDDL action-centric effect-based syntax is much more clear than RDDL's state fluent-centric DBN syntax. However, the RDDL's DBN syntax is far more expressive than PDDL for modeling stochastic domains and I never came up with a hybrid language that would get us the syntactic and expressive best of both worlds. I'm open to suggestions on this -- it's not for this issue, but feel free to email me (Scott Sanner) separately.

On Tue, Feb 13, 2024 at 2:06 PM jihwan-jeong @.***> wrote:

Hi, before diving into the specific issue you described above, can you please clarify what you are trying to model in this domain? When I look at the CPF of robot-at'(?x, ?y), there is some logic that I cannot comprehend.

For example,

else if (( exists{?x2 : xpos, ?y2 : ypos} [ GOAL(?x2,?y2) ^ robot-at(?x2,?y2) ] ) | ( move-north ^ exists{?y2 : ypos} [ NORTH(?y,?y2) ^ robot-at(?x,?y) ] ) | ( move-south ^ exists{?y2 : ypos} [ SOUTH(?y,?y2) ^ robot-at(?x,?y) ] ) | ( move-east ^ exists{?x2 : xpos} [ EAST(?x,?x2) ^ robot-at(?x,?y) ] ) | ( move-west ^ exists_{?x2 : xpos} [ WEST(?x,?x2) ^ robot-at(?x,?y) ] )) then KronDelta(false)

this condition says that if the agent was at (?x, ?y), took one of the move actions, and there existed a location towards the direction, robot-at(?x, ?y) will be false in the next time step, as the agent would have moved. So, this is good.

Then in the next condition (coming here implies that either the robot is not at (?x, ?y) or it doesn't take a move action):

else if (( move-north ^ exists{?y2 : ypos} [ NORTH(?y2,y?) ^ robot-at(?x,?y2) ] ) | ( move-south ^ exists{?y2 : ypos} [ SOUTH(?y2,?y) ^ robot-at(?x,?y2) ] ) | ( move-east ^ exists{?x2 : xpos} [ EAST(?x2,?x) ^ robot-at(?x2,?y) ] ) | ( move-west ^ exists{?x2 : xpos} [ WEST(?x2,?x) ^ robot-at(?x2,?y) ] )) then Bernoulli( 1.0 - P(?x, ?y) )

the robot takes a move action at a position other than (?x, ?y) and you are checking whether the agent will land on (?x, ?y). For example, the robot may be at (?x, ?y) after taking move-north from (?x, ?y2) with NORTH(?y2, ?y)=true with some probability defined by P(?x, ?y). If the Bernoulli rv was sampled to be true, though, this does not "turn off" the original (?x, ?y2) where the robot is located in the current time step, so there will be two locations where robot-at is being true in the next time step. Is this intended?

— Reply to this email directly, view it on GitHub https://github.com/pyrddlgym-project/pyRDDLGym-symbolic/issues/2#issuecomment-1942205510, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRA3JFRLF5QEEPDI5A5V63YTO2T3AVCNFSM6AAAAABDGKD2K6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBSGIYDKNJRGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jihwan-jeong commented 7 months ago

Quick note: I ran simulation with the domain file and the instance using the simulator from pyRDDLGym (not the one using the XADD moel). In fact, the robot disappears (instead of robot-at being true at two cells). Here's the print out:

step       = 0
state      =
{   'robot-at___x1__y1': False,
    'robot-at___x1__y2': False,
    'robot-at___x1__y3': False,
    'robot-at___x1__y4': False,
    'robot-at___x1__y5': False,
    'robot-at___x1__y6': False,
    'robot-at___x2__y1': False,
    'robot-at___x2__y2': False,
    'robot-at___x2__y3': False,
    'robot-at___x2__y4': False,
    'robot-at___x2__y5': False,
    'robot-at___x2__y6': False,
    'robot-at___x3__y1': False,
    'robot-at___x3__y2': False,
    'robot-at___x3__y3': False,
    'robot-at___x3__y4': False,
    'robot-at___x3__y5': False,
    'robot-at___x3__y6': False,
    'robot-at___x4__y1': True,
    'robot-at___x4__y2': False,
    'robot-at___x4__y3': False,
    'robot-at___x4__y4': False,
    'robot-at___x4__y5': False,
    'robot-at___x4__y6': False}
action     =
{'move-north': 0}
next state =
{   'robot-at___x1__y1': False,
    'robot-at___x1__y2': False,
    'robot-at___x1__y3': False,
    'robot-at___x1__y4': False,
    'robot-at___x1__y5': False,
    'robot-at___x1__y6': False,
    'robot-at___x2__y1': False,
    'robot-at___x2__y2': False,
    'robot-at___x2__y3': False,
    'robot-at___x2__y4': False,
    'robot-at___x2__y5': False,
    'robot-at___x2__y6': False,
    'robot-at___x3__y1': False,
    'robot-at___x3__y2': False,
    'robot-at___x3__y3': False,
    'robot-at___x3__y4': False,
    'robot-at___x3__y5': False,
    'robot-at___x3__y6': False,
    'robot-at___x4__y1': True,
    'robot-at___x4__y2': False,
    'robot-at___x4__y3': False,
    'robot-at___x4__y4': False,
    'robot-at___x4__y5': False,
    'robot-at___x4__y6': False}
reward     = -1.0

step       = 1
state      =
{   'robot-at___x1__y1': False,
    'robot-at___x1__y2': False,
    'robot-at___x1__y3': False,
    'robot-at___x1__y4': False,
    'robot-at___x1__y5': False,
    'robot-at___x1__y6': False,
    'robot-at___x2__y1': False,
    'robot-at___x2__y2': False,
    'robot-at___x2__y3': False,
    'robot-at___x2__y4': False,
    'robot-at___x2__y5': False,
    'robot-at___x2__y6': False,
    'robot-at___x3__y1': False,
    'robot-at___x3__y2': False,
    'robot-at___x3__y3': False,
    'robot-at___x3__y4': False,
    'robot-at___x3__y5': False,
    'robot-at___x3__y6': False,
    'robot-at___x4__y1': True,
    'robot-at___x4__y2': False,
    'robot-at___x4__y3': False,
    'robot-at___x4__y4': False,
    'robot-at___x4__y5': False,
    'robot-at___x4__y6': False}
action     =
{'move-north': 1}
next state =
{   'robot-at___x1__y1': False,
    'robot-at___x1__y2': False,
    'robot-at___x1__y3': False,
    'robot-at___x1__y4': False,
    'robot-at___x1__y5': False,
    'robot-at___x1__y6': False,
    'robot-at___x2__y1': False,
    'robot-at___x2__y2': False,
    'robot-at___x2__y3': False,
    'robot-at___x2__y4': False,
    'robot-at___x2__y5': False,
    'robot-at___x2__y6': False,
    'robot-at___x3__y1': False,
    'robot-at___x3__y2': False,
    'robot-at___x3__y3': False,
    'robot-at___x3__y4': False,
    'robot-at___x3__y5': False,
    'robot-at___x3__y6': False,
    'robot-at___x4__y1': False,
    'robot-at___x4__y2': False,
    'robot-at___x4__y3': False,
    'robot-at___x4__y4': False,
    'robot-at___x4__y5': False,
    'robot-at___x4__y6': False}
reward     = -1.0

step       = 2
state      =
{   'robot-at___x1__y1': False,
    'robot-at___x1__y2': False,
    'robot-at___x1__y3': False,
    'robot-at___x1__y4': False,
    'robot-at___x1__y5': False,
    'robot-at___x1__y6': False,
    'robot-at___x2__y1': False,
    'robot-at___x2__y2': False,
    'robot-at___x2__y3': False,
    'robot-at___x2__y4': False,
    'robot-at___x2__y5': False,
    'robot-at___x2__y6': False,
    'robot-at___x3__y1': False,
    'robot-at___x3__y2': False,
    'robot-at___x3__y3': False,
    'robot-at___x3__y4': False,
    'robot-at___x3__y5': False,
    'robot-at___x3__y6': False,
    'robot-at___x4__y1': False,
    'robot-at___x4__y2': False,
    'robot-at___x4__y3': False,
    'robot-at___x4__y4': False,
    'robot-at___x4__y5': False,
    'robot-at___x4__y6': False}
action     =
{'move-east': 0}
next state =
{   'robot-at___x1__y1': False,
    'robot-at___x1__y2': False,
    'robot-at___x1__y3': False,
    'robot-at___x1__y4': False,
    'robot-at___x1__y5': False,
    'robot-at___x1__y6': False,
    'robot-at___x2__y1': False,
    'robot-at___x2__y2': False,
    'robot-at___x2__y3': False,
    'robot-at___x2__y4': False,
    'robot-at___x2__y5': False,
    'robot-at___x2__y6': False,
    'robot-at___x3__y1': False,
    'robot-at___x3__y2': False,
    'robot-at___x3__y3': False,
    'robot-at___x3__y4': False,
    'robot-at___x3__y5': False,
    'robot-at___x3__y6': False,
    'robot-at___x4__y1': False,
    'robot-at___x4__y2': False,
    'robot-at___x4__y3': False,
    'robot-at___x4__y4': False,
    'robot-at___x4__y5': False,
    'robot-at___x4__y6': False}
reward     = -1.0

Right after taking the move-north action in step 1, the states become False for all locations from step 2, and it never recovers after that.

So, there must be some issues with the domain description?

Adding the vis FYI: image

ssanner commented 7 months ago

All: sorry, some confusion on my end. Some quick notes to resolve this thread:

From the original post:

https://pyrddlgym.readthedocs.io/en/latest/xadd.html#xadd-compilation-of-cpfs

For the discussion of the transition:

Right after taking the move-north action in step 1, the states become False for all locations from step 2, and it never recovers after that. So, there must be some issues with the domain description?

I believe the domain is working exactly as described in the comments at the beginning of the domain description... the ability to disappear (which causes dead-ends) is the purpose of the domain (it introduces dead-ends) and allows evaluation of the ability of the planner to reason about the probability of success of different paths (and dead-ends), which is known to cause failures for some determinizing planners.

On Tue, Feb 13, 2024 at 6:56 PM jihwan-jeong @.***> wrote:

Quick note: I ran simulation with the domain file and the instance, and in fact, the robot disappears (instead of robot-at being true at two cells). Here's the print out:

step = 0 state = { 'robot-at_x1_y1': False, 'robot-atx1y2': False, 'robot-atx1y3': False, 'robot-atx1y4': False, 'robot-atx1y5': False, 'robot-atx1y6': False, 'robot-atx2y1': False, 'robot-atx2y2': False, 'robot-atx2y3': False, 'robot-atx2y4': False, 'robot-atx2y5': False, 'robot-atx2y6': False, 'robot-atx3y1': False, 'robot-atx3y2': False, 'robot-atx3y3': False, 'robot-atx3y4': False, 'robot-atx3y5': False, 'robot-atx3y6': False, 'robot-atx4y1': True, 'robot-atx4y2': False, 'robot-atx4y3': False, 'robot-atx4y4': False, 'robot-atx4y5': False, 'robot-atx4y6': False} action = {'move-north': 0} next state = { 'robot-atx1y1': False, 'robot-atx1y2': False, 'robot-atx1y3': False, 'robot-atx1y4': False, 'robot-atx1y5': False, 'robot-atx1y6': False, 'robot-atx2y1': False, 'robot-atx2y2': False, 'robot-atx2y3': False, 'robot-atx2y4': False, 'robot-atx2y5': False, 'robot-atx2y6': False, 'robot-atx3y1': False, 'robot-atx3y2': False, 'robot-atx3y3': False, 'robot-atx3y4': False, 'robot-atx3y5': False, 'robot-atx3y6': False, 'robot-atx4y1': True, 'robot-atx4y2': False, 'robot-atx4y3': False, 'robot-atx4y4': False, 'robot-atx4y5': False, 'robot-atx4__y6': False} reward = -1.0

step = 1 state = { 'robot-at_x1_y1': False, 'robot-atx1y2': False, 'robot-atx1y3': False, 'robot-atx1y4': False, 'robot-atx1y5': False, 'robot-atx1y6': False, 'robot-atx2y1': False, 'robot-atx2y2': False, 'robot-atx2y3': False, 'robot-atx2y4': False, 'robot-atx2y5': False, 'robot-atx2y6': False, 'robot-atx3y1': False, 'robot-atx3y2': False, 'robot-atx3y3': False, 'robot-atx3y4': False, 'robot-atx3y5': False, 'robot-atx3y6': False, 'robot-atx4y1': True, 'robot-atx4y2': False, 'robot-atx4y3': False, 'robot-atx4y4': False, 'robot-atx4y5': False, 'robot-atx4y6': False} action = {'move-north': 1} next state = { 'robot-atx1y1': False, 'robot-atx1y2': False, 'robot-atx1y3': False, 'robot-atx1y4': False, 'robot-atx1y5': False, 'robot-atx1y6': False, 'robot-atx2y1': False, 'robot-atx2y2': False, 'robot-atx2y3': False, 'robot-atx2y4': False, 'robot-atx2y5': False, 'robot-atx2y6': False, 'robot-atx3y1': False, 'robot-atx3y2': False, 'robot-atx3y3': False, 'robot-atx3y4': False, 'robot-atx3y5': False, 'robot-atx3y6': False, 'robot-atx4y1': False, 'robot-atx4y2': False, 'robot-atx4y3': False, 'robot-atx4y4': False, 'robot-atx4y5': False, 'robot-atx4__y6': False} reward = -1.0

step = 2 state = { 'robot-at_x1_y1': False, 'robot-atx1y2': False, 'robot-atx1y3': False, 'robot-atx1y4': False, 'robot-atx1y5': False, 'robot-atx1y6': False, 'robot-atx2y1': False, 'robot-atx2y2': False, 'robot-atx2y3': False, 'robot-atx2y4': False, 'robot-atx2y5': False, 'robot-atx2y6': False, 'robot-atx3y1': False, 'robot-atx3y2': False, 'robot-atx3y3': False, 'robot-atx3y4': False, 'robot-atx3y5': False, 'robot-atx3y6': False, 'robot-atx4y1': False, 'robot-atx4y2': False, 'robot-atx4y3': False, 'robot-atx4y4': False, 'robot-atx4y5': False, 'robot-atx4y6': False} action = {'move-east': 0} next state = { 'robot-atx1y1': False, 'robot-atx1y2': False, 'robot-atx1y3': False, 'robot-atx1y4': False, 'robot-atx1y5': False, 'robot-atx1y6': False, 'robot-atx2y1': False, 'robot-atx2y2': False, 'robot-atx2y3': False, 'robot-atx2y4': False, 'robot-atx2y5': False, 'robot-atx2y6': False, 'robot-atx3y1': False, 'robot-atx3y2': False, 'robot-atx3y3': False, 'robot-atx3y4': False, 'robot-atx3y5': False, 'robot-atx3y6': False, 'robot-atx4y1': False, 'robot-atx4y2': False, 'robot-atx4y3': False, 'robot-atx4y4': False, 'robot-atx4y5': False, 'robot-atx4__y6': False} reward = -1.0

Right after taking the move-north action in step 1, the states become False for all locations from step 2, and it never recovers after that.

So, there must be some issues with the domain description?

— Reply to this email directly, view it on GitHub https://github.com/pyrddlgym-project/pyRDDLGym-symbolic/issues/2#issuecomment-1942872077, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRA3JECS424BAM5DDAMLYTYTP4T7AVCNFSM6AAAAABDGKD2K6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBSHA3TEMBXG4 . You are receiving this because you commented.Message ID: @.***>

jihwan-jeong commented 7 months ago

@Scott Oh, you are right. I had not paid attention to the doc and was just looking at the CPF, my bad.

Anyhow, I agree that the DBN simplification is correct, and so I'll close this issue. Feel free to reopen the issue @GMMDMDIDEMS if you have further questions.

GMMDMDIDEMS commented 7 months ago

@ssanner @jihwan-jeong thank you very much for the detailed explanation.

I additionally had a look at the XADD compilation for the robot-at(x1, y1) variable. For some paths, I have difficulties assigning the path to the respective CPF section. Especially when the SF is "overwritten", e.g. the left red path: robot-at(x1, y2) is true and robot-at(x2, y1) is true and action move-south leads to the Bernoulli probability distribution. This would imply that we are in step

(3) The robot moved into the cell, in which case we do a coin flip

However, this would only be the case if we choose the action move-south from state robot-at(x1, y2).

navigation_xadd_x1y1

The same for the yellow paths: robot-at(x1, y1) is true, and depending on the path also robot-at(x1, y2) and robot-at(x2, y1). We take no action and end up at [1]. Why do we end up at [1] when we are in another state? How is it possible that multiple states can be true?

ssanner commented 7 months ago

How is it possible that multiple states can be true?

This is a great question!

First, it's important to separate your intuitive notion of the domain from what is explicitly coded in RDDL.

Observe that for Navigation that it is never specified that a robot cannot be at two locations:

https://github.com/pyrddlgym-project/rddlrepository/blob/main/rddlrepository/archive/competitions/IPPC2011/Navigation/MDP/domain.rddl

There is a "state-action-constraints" section (now deprecated and replaced with a "state-invariants" section in modern RDDL) at the bottom of the Navigation domain that specifies this, but it is commented out.

However, even if these constraints were added to a "state-invariants" section though, the current XADD compilation does not simplify w.r.t. "state-invariants".

This means that the XADD compiler is not aware that some state configurations are not intended to be possible (i.e., that a robot should not be in two locations). Hence, the XADD compiler must consider all possible state configurations (including where more than one cell has a robot). So the paths you highlight you're seeing are the reasoning about the effects in the case of multiple robots.

Are these highlighted paths correct? I'm not sure since I only wrote the cpfs with the single robot case in mind, but whether these paths are correct does not actually matter for the intended single robot configuration. By definition, you will never evaluate a path in the XADD that corresponds to an unreachable state. And for Navigation, you can manually verify that if you start in an initial state with only one robot then at any time in the future, you will only have <= 1 robot.

So how to interpret these XADDs? Quite simply, ignore any path that violates a state-invariant that will hold during simulation... you'll never evaluate that XADD path in any state.

NOTE: For discrete domains, you may want to set reparam=False when you create RDDLModelXADD(...) to get rid of the reparameterization with UNIFORM random variables and instead show the leaves directly as the probabilities. Reparameterization is useful for simulation (you can pre-sample all canonical random variables such as the UNIFORM and then just read off the next state deterministically from the XADD) but harder to read.

Going forward, I think it would be useful for the XADD compiler to simplify w.r.t. state-invariants and I'll add this as an open issue.

On Sat, Feb 17, 2024 at 9:21 AM GMMDMDIDEMS @.***> wrote:

@ssanner https://github.com/ssanner @jihwan-jeong https://github.com/jihwan-jeong thank you very much for the detailed explanation.

I additionally had a look at the XADD compilation for the robot-at(x1, y1) variable. For some paths, I have difficulties assigning the path to the respective CPF section. Especially when the SF is "overwritten", e.g. the left red path: robot-at(x1, y2) is true and robot-at(x2, y1) is true and action move-south leads to the Bernoulli probability distribution. This would imply that we are in step

(3) The robot moved into the cell, in which case we do a coin flip

However, this would only be the case if we choose the action move-south from state robot-at(x1, y2).

navigation_xadd_x1y1.png (view on web) https://github.com/pyrddlgym-project/pyRDDLGym-symbolic/assets/55764668/afa3eaac-a886-4a21-a6e5-6ac4df63af7f

The same for the yellow paths: robot-at(x1, y1) is true, and depending on the path also robot-at(x1, y2) and robot-at(x2, y1). We take no action and end up at [1]. Why do we end up at [1] when we are in another state? How is it possible that multiple states can be true?

— Reply to this email directly, view it on GitHub https://github.com/pyrddlgym-project/pyRDDLGym-symbolic/issues/2#issuecomment-1950222120, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRA3JETA4UYERHNYUPVGDLYUC4FPAVCNFSM6AAAAABDGKD2K6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJQGIZDEMJSGA . You are receiving this because you were mentioned.Message ID: @.***>

GMMDMDIDEMS commented 7 months ago

Thanks again for the great explanation. That makes sense.

NOTE: For discrete domains, you may want to set reparam=False when you create RDDLModelXADD(...) to get rid of the reparameterization with UNIFORM random variables and instead show the leaves directly as the probabilities. Reparameterization is useful for simulation (you can pre-sample all canonical random variables such as the UNIFORM and then just read off the next state deterministically from the XADD) but harder to read.

Currently only Bernoulli can be used without reparameterisation. Since the navigation domain uses multiple probability distributions, it is unfortunately not possible to set self.reparam=False.

My general goal is to extract the dependency between the current state, the action we perform in the current state and next state it leads us to. Likewise, the dependency between two states when we do not perform an action. With https://github.com/pyrddlgym-project/pyRDDLGym-symbolic/issues/3, I could parse the simplified XADD tree structure and extract all dependencies (current state + action -> next state, state -> next state) that will set a next state to True. However, it would require to add state-invariants to every domain.rddl file.

With rddlsim you could extract that information from the parsed instance (For the mentioned Navigation instance.rddl file we got):

#####TASK##### Here
## name
navigation_inst_mdp__700
## horizon
40
## discount factor
1
## number of action fluents
4
## number of det state fluents
8
## number of prob state fluents
16
## number of preconds
0
## number of actions
5
## number of hashing functions
25
## initial state
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
## 1 if task is deterministic
0
## 1 if state hashing possible
1
## 1 if kleene state hashing possible
1

#####ACTION FLUENTS#####
## index
0
## name
move-east
## number of values
2
## values
0 false
1 true

## index
1
## name
move-north
## number of values
2
## values
0 false
1 true

## index
2
## name
move-south
## number of values
2
## values
0 false
1 true

## index
3
## name
move-west
## number of values
2
## values
0 false
1 true

#####DET STATE FLUENTS AND CPFS#####
## index
0
## name
robot-at(x1, y1)
## number of values
2
## values
0 false
1 true
## formula
switch( (or($s(7) and($a(1) $s(0)) and($a(0) $s(0))) : $c(0)) (or(and($a(2) $s(8)) and($a(3) $s(2))) : $c(1)) ($c(1) : $s(0)) )
## hash index
0
## caching type 
NONE
## kleene caching type
NONE
## action hash keys
0 0
1 1
2 2
3 3
4 4

## index
1
## name
robot-at(x1, y6)
## number of values
2
## values
0 false
1 true
## formula
switch( (or($s(7) and($a(2) $s(1)) and($a(0) $s(1))) : $c(0)) (or(and($a(1) $s(11)) and($a(3) $s(3))) : $c(1)) ($c(1) : $s(1)) )
## hash index
1
## caching type 
NONE
## kleene caching type
NONE
## action hash keys
0 0
1 1
2 2
3 3
4 4

For robot-at(x1, y1) we can extract the information that robot-at(x1, y1) is True for: (1) robot is in s(1) (2) robot is in s(11) and takes action a(1) or (3) robot is in s(3) and takes action a(3)

Is there a way to extract this information with pyRDDLGym?

ssanner commented 7 months ago

unfortunately not possible to set self.reparam=False

It seems this may be due to the KronDelta (which is no longer required and can just be removed) so that should not prevent compilation with self.reparam=False. I've pinged @jihwan-jeong https://github.com/jihwan-jeong about fixing this minor issue.

However, it would require to add state-invariants to every domain.rddl file.

If you want the simplified version of the XADD under the state-invariants, then yes. This is literally just the constraint

[sum_{?x : xpos, ?y : ypos} robot-at(?x,?y)] <= 1;

But indeed, doing this also requires us to resolve #3 so it is not currently an option.

Is there a way to extract this information with pyRDDLGym?

I'm not sure what code in the Java rddlsim provided the output you are referencing... it's not code that I wrote so I am not familiar with it.

Since state-invariants will naturally hold in an instance, it does seem possible in theory to extract the information you are looking for from the instance, but we don't have any such extraction code in PyRDDLGym. I am slightly concerned though that this code is enumerating states and actions, which will only work for relatively small discrete domains (and especially just those with one-hot state and action fluents).

In general, if you want to extract cpf compilations under state-invariants then simplifying the XADD under the state-invariants in #3 is the right approach that will work efficiently for all of RDDL (large discrete and continuous domains). I'll check with @jihwan-jeong to see when he might have time to contribute this. (In principle, since we can already compile expressions into XADDs, this seems like only a few lines of code.)

On Sun, Feb 18, 2024 at 9:31 AM GMMDMDIDEMS @.***> wrote:

Thanks again for the great explanation. That makes sense.

NOTE: For discrete domains, you may want to set reparam=False when you create RDDLModelXADD(...) to get rid of the reparameterization with UNIFORM random variables and instead show the leaves directly as the probabilities. Reparameterization is useful for simulation (you can pre-sample all canonical random variables such as the UNIFORM and then just read off the next state deterministically from the XADD) but harder to read.

Currently only Bernoulli can be used without reparameterisation. Since the navigation domain uses multiple probability distributions, it is unfortunately not possible to set self.reparam=False.

My general goal is to extract the dependency between the current state, the action we perform in the current state and next state it leads us to. Likewise, the dependency between two states when we do not perform an action. With #3 https://github.com/pyrddlgym-project/pyRDDLGym-symbolic/issues/3, I could parse the simplified XADD tree structure and extract all dependencies (current state + action -> next state, state -> next state) that will set a next state to True. However, it would require to add state-invariants to every domain.rddl file.

With rddlsim you could extract that information from the parsed instance (For the mentioned Navigation instance.rddl file we got):

TASK##### Here

name

navigation_inst_mdp__700

horizon

40

discount factor

1

number of action fluents

4

number of det state fluents

8

number of prob state fluents

16

number of preconds

0

number of actions

5

number of hashing functions

25

initial state

0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 if task is deterministic

0

1 if state hashing possible

1

1 if kleene state hashing possible

1

ACTION FLUENTS

index

0

name

move-east

number of values

2

values

0 false 1 true

index

1

name

move-north

number of values

2

values

0 false 1 true

index

2

name

move-south

number of values

2

values

0 false 1 true

index

3

name

move-west

number of values

2

values

0 false 1 true

DET STATE FLUENTS AND CPFS

index

0

name

robot-at(x1, y1)

number of values

2

values

0 false 1 true

formula

switch( (or($s(7) and($a(1) $s(0)) and($a(0) $s(0))) : $c(0)) (or(and($a(2) $s(8)) and($a(3) $s(2))) : $c(1)) ($c(1) : $s(0)) )

hash index

0

caching type

NONE

kleene caching type

NONE

action hash keys

0 0 1 1 2 2 3 3 4 4

index

1

name

robot-at(x1, y6)

number of values

2

values

0 false 1 true

formula

switch( (or($s(7) and($a(2) $s(1)) and($a(0) $s(1))) : $c(0)) (or(and($a(1) $s(11)) and($a(3) $s(3))) : $c(1)) ($c(1) : $s(1)) )

hash index

1

caching type

NONE

kleene caching type

NONE

action hash keys

0 0 1 1 2 2 3 3 4 4

For robot-at(x1, y1) we can extract the information that robot-at(x1, y1) is True for: (1) robot is in s(1) (2) robot is in s(11) and takes action a(1) or (3) robot is in s(3) and takes action a(3)

Is there a way to extract this information with pyRDDLGym?

— Reply to this email directly, view it on GitHub https://github.com/pyrddlgym-project/pyRDDLGym-symbolic/issues/2#issuecomment-1951343201, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRA3JGNKIYELVKRF6WPESTYUIGELAVCNFSM6AAAAABDGKD2K6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJRGM2DGMRQGE . You are receiving this because you were mentioned.Message ID: @.***>

GMMDMDIDEMS commented 7 months ago

Sorry for the confusion. The output of the parsed RDDL instance was from the PROST internal RDDL parser.