pyrddlgym-project / pyRDDLGym

A toolkit for auto-generation of OpenAI Gym environments from RDDL description files.
https://pyrddlgym.readthedocs.io/
MIT License
68 stars 17 forks source link

ArithmeticError: Cannot evaluate arithmetic operation #235

Closed GMMDMDIDEMS closed 9 months ago

GMMDMDIDEMS commented 9 months ago

When running PROST via the provided Docker, I get the following ArithmeticError from the RDDLSimAgent (domain.rddl and instance.rddl are appended):

raise ArithmeticError( ArithmeticError: Cannot evaluate arithmetic operation / at [0 0 0 0] and [0 0 1 2]. ( sum{?c2: course} [ PREREQ(?c2, ?c) ^ passed(?c2) ] ) / ( sum{?c2: course} [ PREREQ(?c2, ?c) ] )

The domain is a slightly modified version of the Academic Advising domain. Considering the error message, there must be an error in the cpfs part for the definition of passed'(?c):

           else if ([[sum_{?c2 : course} (PREREQ(?c2,?c) ^ passed(?c2))]
                                      / [sum_{?c2 : course} PREREQ(?c2,?c)]] == 1.0)
                 then Bernoulli( 0.95 )

The error message and the lists provided confuse me a little, because they lead me to believe that the summation does not work (e.g. missing brackets). I changed the part above to the following, which I think should lead to the same result and it works:

else if ((sum_{?c2 : course}[PREREQ(?c2,?c) ^ passed(?c2)]) == (sum_{?c2 : course}[PREREQ(?c2,?c)]))
                                then Bernoulli( 0.95 )

domain.rddl

//
// Academic Advising Domain
//
// Author:  Libby Ferland (libby.knouse@uky.edu)
//
// In this domain, a student may take courses at a given cost
// and passes the course with a probability determined by how
// many of the prerequisites they have successfully passed.
// A student also receives a penalty at each time step if they
// have not yet graduated from their program (i.e., completed
// all required courses).  We allow multiple courses to be
// taken in a semester in some instances.
//
// Modified for competition and translation purposes by Scott Sanner.
//
///////////////////////////////////////////////////////////////

domain academic_advising_prob_mdp {

  types {
    course : object;
  };

  pvariables {

    // Nonfluents: course prerequisites
    PREREQ(course, course) : { non-fluent, bool, default = false }; // First argument is a prereq of second argument

    // Nonfluents: course passing probabilities
    PRIOR_PROB_PASS_NO_PREREQ(course) : { non-fluent, real, default = 1.0 }; // Probability of passing a course with no prereqs
    PRIOR_PROB_PASS(course)           : { non-fluent, real, default = 0.2 }; // Probability of passing a course regardless of prereq status

    // Nonfluents: program requirements for graduation
    PROGRAM_REQUIREMENT(course) : { non-fluent, bool, default = false }; // Specifies whether course is program requirement 

    // Nonfluents: costs/penalties
    COURSE_COST(course)        : { non-fluent, real, default = -1 }; // Cost for taking a course
    COURSE_RETAKE_COST(course) : { non-fluent, real, default = -2 }; // Cost for re-taking a course (heavily discouraged)
    PROGRAM_INCOMPLETE_PENALTY : { non-fluent, real, default = -5 }; // Penalty at each time step for having an incomplete program

    // State
    passed(course) : { state-fluent, bool, default = false };
    taken(course)  : { state-fluent, bool, default = false };

    // Action
    takeCourse(course)   : { action-fluent, bool, default = false };
  };

  cpfs {

    // Determine whether each course was passed
    // Modification: differentiate courses with no prereqs since should be easier to pass such introductory courses
    // For courses with prereqs:
    //   if PRIOR_PROB_PASS=.2 and 0 out of 3 prereqs were taken, the distribution is Bernoulli(.2 + .8 * (0/4)) = Bernoulli(.2)
    //                             1 out of 3 prereqs were taken, the distribution is Bernoulli(.2 + .8 * (1/4)) = Bernoulli(.4)
    //                             3 out of 3 prereqs were taken, the distribution is Bernoulli(.2 + .8 * (3/4)) = Bernoulli(.8)
    passed'(?c) = 
        if (takeCourse(?c) ^ ~passed(?c)) // If take a course and not already passed 
            then [  if (~exists_{?c2 : course} PREREQ(?c2,?c))
                    then Bernoulli( PRIOR_PROB_PASS_NO_PREREQ(?c) )  
                                else if ([[sum_{?c2 : course} (PREREQ(?c2,?c) ^ passed(?c2))]
                                                   / [sum_{?c2 : course} PREREQ(?c2,?c)]] == 1.0)
                                then Bernoulli( 0.95 )
                                else
                     Bernoulli( 0.05 ) 
            ]
            else
                passed(?c); // Value persists if course not taken or already passed

    taken'(?c) = taken(?c) | takeCourse(?c);

  };

  // A student is assessed a cost for taking each course and a penalty for not completing their program   
  reward = 
      [sum_{?c : course} [COURSE_COST(?c) * (takeCourse(?c) ^ ~taken(?c))]]
    + [sum_{?c : course} [COURSE_RETAKE_COST(?c) * (takeCourse(?c) ^ taken(?c))]]
    + [PROGRAM_INCOMPLETE_PENALTY * ~[forall_{?c : course} (PROGRAM_REQUIREMENT(?c) => passed(?c))]];

}     

instance.rddl

non-fluents nf_academic_advising_prob_inst_mdp__0 {
    domain = academic_advising_prob_mdp; 
    objects{
        course : {CS11, CS12, CS21, CS22 };
    };

    non-fluents {
        PREREQ(CS12,CS21);
        PREREQ(CS11,CS22);
        PREREQ(CS12,CS22);
        PROGRAM_REQUIREMENT(CS22);
    };
}

instance academic_advising_prob_inst_mdp__0 {
    domain = academic_advising_prob_mdp;
    non-fluents = nf_academic_advising_prob_inst_mdp__0;
    max-nondef-actions = 1;
    horizon = 40;
    discount = 1.0;
}

Edit: I have corrected the error message to the above specified instance.

mike-gimelfarb commented 9 months ago

I think you may be dividing by zero, hence the error. I will try to take a closer look this week. If you fill in PREREQ such that sum_{?c2} PREREQ(?c2,?c) is no longer zero for all values of ?c, it should no longer complain. In any case, your rewrite is more numerically safe.

mike-gimelfarb commented 9 months ago

I ran your example, and it is consistent with how we wanted to handle divide-by-zero errors in pyrddlgym. You will find this error if you run the scripts from GymExample.py as well.

I am closing this issue, but I will likely revisit the error message handling in the near future to make it more informative. If you identify another error that misfires or is misleading, please feel free to reopen the issue.

GMMDMDIDEMS commented 9 months ago

I see the problem with division-by-zero errors, but should they be possible at all considering the mentioned domain.rddl. If I am not mistaken, these should be intercepted via the following if-statement:

if (~exists_{?c2 : course} PREREQ(?c2,?c))
        then Bernoulli( PRIOR_PROB_PASS_NO_PREREQ(?c) ) 

The full annotated formula:

passed'(?c) = 
        if (takeCourse(?c) ^ ~passed(?c)) // If take a course and not already passed 
            then [  if (~exists_{?c2 : course} PREREQ(?c2,?c)) // if there are no prereqs
                    then Bernoulli( PRIOR_PROB_PASS_NO_PREREQ(?c) )  
                                else if (([(sum_{?c2 : course}[PREREQ(?c2,?c) ^ passed(?c2)]) // if all prereqs already passed
                                    / (sum_{?c2 : course}[PREREQ(?c2,?c)])]) == 1.0)
                                then Bernoulli( 0.95 )
                                else // not all prereqs passed yet
                     Bernoulli( 0.05 ) 
            ]
            else
                passed(?c); // Value persists if course not taken or already passed
mike-gimelfarb commented 9 months ago

Indeed, short-circuiting does not work as intuition suggests in the vectorized environments. Unfortunately, I do not see a natural way to fix this, since the vectorized computation in numpy is incompatible with short-circuiting.

There are currently two ways to fix this. If you ground the domain, then there is no vectorization and the branching will work as intended. A better alternative currently is to add the following lines before you call evaluate(), or anywhere after you initialize the environment:

import numpy as np
np.seterr('ignore')

This will override the default setting of catching numerical errors, but will ignore any valid errors that arise. You can do this if you are sure your environment does not have other numerical errors besides the divide-by-zero.

If you have any suggestions how to handle branching in general domains while short circuiting, please feel free to submit a PR.

mike-gimelfarb commented 9 months ago

I have opened up a more general issue #238 . We need to think about how to handle the problem more elegantly in general... not sure yet how it can be done.