moves-rwth / storm

A Modern Probabilistic Model Checker
https://www.stormchecker.org
GNU General Public License v3.0
126 stars 73 forks source link

Crash in policy iteration #555

Closed sjunges closed 1 month ago

sjunges commented 1 month ago

For the DRN at the bottom and the following query, we obtain an error which I believe is due to the creation of the initial policy. The error occurs with and without the exact flag (without, one must set ./bin/storm -drn exactbug.drn --prop "R{\"costs\"}min=? [F \"goal\"]" --minmax:method topological --topological:minmax pi).

EDIT: Regular PI also fails on this model

Some helpful assertions that help are presented in https://github.com/moves-rwth/storm/pull/554

 ./bin/storm -drn exactbug.drn --prop "R{\"costs\"}min=? [F \"goal\"]" --exact
Storm 1.8.2 (dev)
DEBUG BUILD

Date: Fri May 24 22:44:58 2024
Command line arguments: -drn exactbug.drn --prop 'R{"costs"}min=? [F "goal"]'  --exact
Current working directory: /Users/junges/storm/build

Time for model construction: 0.018s.

-------------------------------------------------------------- 
Model type:     MDP (sparse)
States:     107
Transitions:    415
Choices:    265
Reward Models:  steps, refuels, costs
State Labels:   5 labels
   * init -> 1 item(s)
   * notbad -> 88 item(s)
   * stationvisit -> 15 item(s)
   * traps -> 6 item(s)
   * goal -> 6 item(s)
Choice Labels:  none
-------------------------------------------------------------- 

ERROR (SparseMatrix.cpp:1341): Cannot point to row offset 2 for rowGroup 37 which starts at 100 and ends at 102.
Assertion failed: (rowGroupToRowIndexMapping[rowGroupIndex] < this->getRowGroupSize(rowGroupIndex)), function selectRowsFromRowGroups, file SparseMatrix.cpp, line 1341.
ERROR: The program received signal 6 and will be aborted in 3s.

LLDB (without the added assertions says):

arch -arm64 lldb -- ./bin/storm --prism ~/refuel-mdp.nm --prop "R{\"costs\"}min=? [F \"goal\"]" -const "N=18,ENERGY=38" --exact
(lldb) target create "./bin/storm"
Current executable set to '/Users/junges/storm/build/bin/storm' (arm64).
(lldb) settings set -- target.run-args  "--prism" "/Users/junges/refuel-mdp.nm" "--prop" "R{\"costs\"}min=? [F \"goal\"]" "-const" "N=18,ENERGY=38" "--exact"
(lldb) process launch
Process 17376 launched: '/Users/junges/storm/build/bin/storm' (arm64)
Storm 1.8.2 (dev)
DEBUG BUILD

Date: Fri May 24 21:45:55 2024
Command line arguments: --prism /Users/junges/refuel-mdp.nm --prop 'R{"costs"}min=? [F "goal"]' -const 'N=18,ENERGY=38' --exact
Current working directory: /Users/junges/storm/build

Time for model input parsing: 0.047s.

Time for model construction: 0.961s.

-------------------------------------------------------------- 
Model type:     MDP (sparse)
States:     11390
Transitions:    81027
Choices:    41933
Reward Models:  costs
State Labels:   3 labels
   * deadlock -> 0 item(s)
   * init -> 1 item(s)
   * goal -> 33 item(s)
Choice Labels:  none
-------------------------------------------------------------- 

Model checking property "1": R[exp]{"costs"}min=? [F "goal"] ...
Process 17376 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x148000000)
    frame #0: 0x00000001220a33e0 libstorm.dylib`storm::storage::SparseMatrix<__gmp_expr<__mpq_struct [1], __mpq_struct [1]> >::selectRowsFromRowGroups(this=0x0000600002c6c080, rowGroupToRowIndexMapping=size=7136, insertDiagonalEntries=false) const at SparseMatrix.cpp:1344:21
   1341 
   1342         // Iterate through that row and count the number of slots we have to reserve for copying.
   1343         bool foundDiagonalElement = false;
-> 1344         for (const_iterator it = this->begin(rowToCopy), ite = this->end(rowToCopy); it != ite; ++it) {
   1345             if (it->getColumn() == rowGroupIndex) {
   1346                 foundDiagonalElement = true;
   1347             }
Target 0: (storm) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x148000000)
  * frame #0: 0x00000001220a33e0 libstorm.dylib`storm::storage::SparseMatrix<__gmp_expr<__mpq_struct [1], __mpq_struct [1]> >::selectRowsFromRowGroups(this=0x0000600002c6c080, rowGroupToRowIndexMapping=size=7136, insertDiagonalEntries=false) const at SparseMatrix.cpp:1344:21
    frame #1: 0x0000000121974258 libstorm.dylib`storm::solver::IterativeMinMaxLinearEquationSolver<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, __gmp_expr<__mpq_struct [1], __mpq_struct [1]> >::solveInducedEquationSystem(this=0x0000000103f08140, env=0x000000016fdf76b0, linearEquationSolver=nullptr, scheduler=size=7136, x=size=7136, subB=size=7136, originalB=size=25789) const at IterativeMinMaxLinearEquationSolver.cpp:169:30
    frame #2: 0x00000001219745e8 libstorm.dylib`storm::solver::IterativeMinMaxLinearEquationSolver<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, __gmp_expr<__mpq_struct [1], __mpq_struct [1]> >::performPolicyIteration(this=0x0000000103f08140, env=0x000000016fdf76b0, dir=Minimize, x=size=7136, b=size=25789, initialPolicy=size=0) const at IterativeMinMaxLinearEquationSolver.cpp:242:13
    frame #3: 0x00000001219720c4 libstorm.dylib`storm::solver::IterativeMinMaxLinearEquationSolver<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, __gmp_expr<__mpq_struct [1], __mpq_struct [1]> >::solveEquationsPolicyIteration(this=0x0000000103f08140, env=0x000000016fdf76b0, dir=Minimize, x=size=7136, b=size=25789) const at IterativeMinMaxLinearEquationSolver.cpp:196:12
    frame #4: 0x000000012197055c libstorm.dylib`storm::solver::IterativeMinMaxLinearEquationSolver<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, __gmp_expr<__mpq_struct [1], __mpq_struct [1]> >::internalSolveEquations(this=0x0000000103f08140, env=0x000000016fdf76b0, dir=Minimize, x=size=7136, b=size=25789) const at IterativeMinMaxLinearEquationSolver.cpp:98:22
    frame #5: 0x00000001219ac8cc libstorm.dylib`storm::solver::MinMaxLinearEquationSolver<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, __gmp_expr<__mpq_struct [1], __mpq_struct [1]> >::solveEquations(this=0x0000000103f08140, env=0x000000016fdf76b0, d=Minimize, x=size=7136, b=size=25789) const at MinMaxLinearEquationSolver.cpp:37:12
    frame #6: 0x0000000121d1bd14 libstorm.dylib`storm::solver::TopologicalMinMaxLinearEquationSolver<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, __gmp_expr<__mpq_struct [1], __mpq_struct [1]> >::solveScc(this=0x0000000103f07f90, sccSolverEnvironment=0x000000016fdf76b0, dir=Minimize, sccRowGroups=0x000000016fdf74b8, sccRows=0x000000016fdf74a8, globalX=size=7845, globalB=size=27640) const at TopologicalMinMaxLinearEquationSolver.cpp:382:33

The drn file content is here;

// Exported by storm
// Original model type: MDP
@type: MDP
@parameters

@reward_models
costs refuels steps 
@nr_states
107
@nr_choices
265
@model
state 0 [0, 0, 0] init notbad stationvisit
    action 0 [0, 0, 0]
        1 : 1
state 1 [0, 0, 0] notbad stationvisit
    action 0 [1, 0, 1]
        2 : 7/10
        3 : 3/10
    action 1 [1, 0, 1]
        4 : 7/10
        5 : 3/10
state 2 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        6 : 1
    action 1 [1, 0, 1]
        7 : 7/10
        8 : 3/10
    action 2 [1, 0, 1]
        9 : 7/10
        10 : 3/10
state 3 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        6 : 3/10
        11 : 7/10
    action 1 [1, 0, 1]
        8 : 1
    action 2 [1, 0, 1]
        12 : 7/10
        13 : 3/10
state 4 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        9 : 7/10
        12 : 3/10
    action 1 [1, 0, 1]
        14 : 7/10
        15 : 3/10
    action 2 [1, 0, 1]
        6 : 1
state 5 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        10 : 7/10
        13 : 3/10
    action 1 [1, 0, 1]
        15 : 1
    action 2 [1, 0, 1]
        6 : 3/10
        16 : 7/10
state 6 [0, 0, 0] notbad stationvisit
    action 0 [3, 1, 0]
        1 : 1
state 7 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        17 : 7/10
        18 : 3/10
    action 1 [1, 0, 1]
        19 : 1
    action 2 [1, 0, 1]
        20 : 7/10
        21 : 3/10
state 8 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        17 : 3/10
        22 : 7/10
    action 1 [1, 0, 1]
        23 : 7/10
        24 : 3/10
state 9 [0, 0, 0] notbad stationvisit
    action 0 [3, 1, 0]
        25 : 1
state 10 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        26 : 1
    action 1 [1, 0, 1]
        21 : 7/10
        24 : 3/10
    action 2 [1, 0, 1]
        27 : 1
    action 3 [1, 0, 1]
        17 : 3/10
        28 : 7/10
state 11 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        18 : 1
    action 1 [1, 0, 1]
        19 : 3/10
        22 : 7/10
    action 2 [1, 0, 1]
        28 : 7/10
        29 : 3/10
state 12 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        28 : 7/10
        30 : 3/10
    action 1 [1, 0, 1]
        23 : 1
    action 2 [1, 0, 1]
        21 : 7/10
        31 : 3/10
    action 3 [1, 0, 1]
        22 : 1
state 13 [0, 0, 0] traps
    action 0 [1, 0, 1]
        26 : 3/10
        29 : 7/10
    action 1 [1, 0, 1]
        24 : 1
    action 2 [1, 0, 1]
        31 : 1
    action 3 [1, 0, 1]
        20 : 7/10
        22 : 3/10
state 14 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        21 : 3/10
        29 : 7/10
    action 1 [1, 0, 1]
        32 : 1
    action 2 [1, 0, 1]
        18 : 3/10
        30 : 7/10
state 15 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        27 : 7/10
        31 : 3/10
    action 1 [1, 0, 1]
        26 : 7/10
        30 : 3/10
state 16 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        20 : 3/10
        28 : 7/10
    action 1 [1, 0, 1]
        26 : 7/10
        32 : 3/10
    action 2 [1, 0, 1]
        18 : 1
state 17 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        33 : 1
    action 1 [1, 0, 1]
        34 : 7/10
        35 : 3/10
    action 2 [1, 0, 1]
        36 : 7/10
        37 : 3/10
state 18 [0, 0, 0] notbad stationvisit
    action 0 [3, 1, 0]
        1 : 1
state 19 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        34 : 7/10
        38 : 3/10
    action 1 [1, 0, 1]
        39 : 7/10
        40 : 3/10
state 20 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        36 : 7/10
        41 : 3/10
    action 1 [1, 0, 1]
        39 : 1
    action 2 [1, 0, 1]
        42 : 7/10
        43 : 3/10
    action 3 [1, 0, 1]
        34 : 1
state 21 [0, 0, 0] traps
    action 0 [1, 0, 1]
        37 : 7/10
        44 : 3/10
    action 1 [1, 0, 1]
        40 : 1
    action 2 [1, 0, 1]
        43 : 1
    action 3 [1, 0, 1]
        34 : 3/10
        45 : 7/10
state 22 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        33 : 3/10
        38 : 7/10
    action 1 [1, 0, 1]
        35 : 1
    action 2 [1, 0, 1]
        42 : 3/10
        45 : 7/10
state 23 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        36 : 3/10
        45 : 7/10
    action 1 [1, 0, 1]
        40 : 7/10
        46 : 3/10
    action 2 [1, 0, 1]
        35 : 1
state 24 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        37 : 3/10
        42 : 7/10
    action 1 [1, 0, 1]
        46 : 1
    action 2 [1, 0, 1]
        35 : 3/10
        39 : 7/10
state 25 [0, 0, 0] notbad stationvisit
    action 0 [1, 0, 1]
        4 : 1
    action 1 [1, 0, 1]
        47 : 7/10
        48 : 3/10
    action 2 [1, 0, 1]
        49 : 7/10
        50 : 3/10
    action 3 [1, 0, 1]
        2 : 1
state 26 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        37 : 7/10
        42 : 3/10
    action 1 [1, 0, 1]
        51 : 1
    action 2 [1, 0, 1]
        33 : 3/10
        41 : 7/10
state 27 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        51 : 1
    action 1 [1, 0, 1]
        43 : 7/10
        46 : 3/10
    action 2 [1, 0, 1]
        36 : 3/10
        37 : 7/10
state 28 [0, 0, 0] notbad stationvisit
    action 0 [3, 1, 0]
        25 : 1
state 29 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        44 : 1
    action 1 [1, 0, 1]
        40 : 3/10
        42 : 7/10
    action 2 [1, 0, 1]
        52 : 1
    action 3 [1, 0, 1]
        36 : 7/10
        38 : 3/10
state 30 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        36 : 7/10
        45 : 3/10
    action 1 [1, 0, 1]
        44 : 7/10
        51 : 3/10
    action 2 [1, 0, 1]
        33 : 1
state 31 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        51 : 3/10
        52 : 7/10
    action 1 [1, 0, 1]
        46 : 1
    action 2 [1, 0, 1]
        42 : 7/10
        45 : 3/10
state 32 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        43 : 3/10
        52 : 7/10
    action 1 [1, 0, 1]
        41 : 3/10
        44 : 7/10
state 33 [0, 0, 0] notbad stationvisit
    action 0 [3, 1, 0]
        1 : 1
state 34 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        53 : 7/10
        54 : 3/10
    action 1 [1, 0, 1]
        55 : 1
    action 2 [1, 0, 1]
        56 : 7/10
        57 : 3/10
state 35 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        53 : 3/10
        58 : 7/10
    action 1 [1, 0, 1]
        59 : 7/10
        60 : 3/10
state 36 [0, 0, 0] notbad stationvisit
    action 0 [3, 1, 0]
        25 : 1
state 37 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        61 : 1
    action 1 [1, 0, 1]
        57 : 7/10
        60 : 3/10
    action 2 [1, 0, 1]
        62 : 1
    action 3 [1, 0, 1]
        53 : 3/10
        63 : 7/10
state 38 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        54 : 1
    action 1 [1, 0, 1]
        55 : 3/10
        58 : 7/10
    action 2 [1, 0, 1]
        63 : 7/10
        64 : 3/10
state 39 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        56 : 7/10
        63 : 3/10
    action 1 [1, 0, 1]
        60 : 7/10
        65 : 3/10
    action 2 [1, 0, 1]
        55 : 1
state 40 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        57 : 7/10
        64 : 3/10
    action 1 [1, 0, 1]
        65 : 1
    action 2 [1, 0, 1]
        55 : 3/10
        59 : 7/10
state 41 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        56 : 3/10
        63 : 7/10
    action 1 [1, 0, 1]
        61 : 7/10
        66 : 3/10
    action 2 [1, 0, 1]
        54 : 1
state 42 [0, 0, 0] traps
    action 0 [1, 0, 1]
        61 : 3/10
        64 : 7/10
    action 1 [1, 0, 1]
        60 : 1
    action 2 [1, 0, 1]
        67 : 1
    action 3 [1, 0, 1]
        56 : 7/10
        58 : 3/10
state 43 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        62 : 7/10
        66 : 3/10
    action 1 [1, 0, 1]
        65 : 1
    action 2 [1, 0, 1]
        56 : 3/10
        57 : 7/10
state 44 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        57 : 3/10
        64 : 7/10
    action 1 [1, 0, 1]
        66 : 1
    action 2 [1, 0, 1]
        54 : 3/10
        68 : 7/10
state 45 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        63 : 7/10
        68 : 3/10
    action 1 [1, 0, 1]
        59 : 1
    action 2 [1, 0, 1]
        57 : 7/10
        67 : 3/10
    action 3 [1, 0, 1]
        58 : 1
state 46 [0, 0, 0] goal notbad
    action 0 [0, 0, 0]
        46 : 1
state 47 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        9 : 7/10
        16 : 3/10
    action 1 [1, 0, 1]
        69 : 1
    action 2 [1, 0, 1]
        13 : 7/10
        70 : 3/10
    action 3 [1, 0, 1]
        7 : 1
state 48 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        9 : 3/10
        12 : 7/10
    action 1 [1, 0, 1]
        71 : 7/10
        72 : 3/10
    action 2 [1, 0, 1]
        8 : 1
state 49 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        14 : 1
    action 1 [1, 0, 1]
        13 : 7/10
        71 : 3/10
    action 2 [1, 0, 1]
        73 : 1
    action 3 [1, 0, 1]
        9 : 7/10
        11 : 3/10
state 50 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        15 : 1
    action 1 [1, 0, 1]
        70 : 7/10
        72 : 3/10
    action 2 [1, 0, 1]
        9 : 3/10
        10 : 7/10
state 51 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        62 : 7/10
        67 : 3/10
    action 1 [1, 0, 1]
        61 : 7/10
        68 : 3/10
state 52 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        66 : 1
    action 1 [1, 0, 1]
        65 : 3/10
        67 : 7/10
    action 2 [1, 0, 1]
        63 : 3/10
        64 : 7/10
state 53 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        74 : 1
    action 1 [1, 0, 1]
        75 : 7/10
        76 : 3/10
    action 2 [1, 0, 1]
        77 : 7/10
        78 : 3/10
state 54 [0, 0, 0] notbad stationvisit
    action 0 [3, 1, 0]
        1 : 1
state 55 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        75 : 7/10
        79 : 3/10
    action 1 [1, 0, 1]
        80 : 7/10
        81 : 3/10
state 56 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        77 : 7/10
        82 : 3/10
    action 1 [1, 0, 1]
        80 : 1
    action 2 [1, 0, 1]
        83 : 7/10
        84 : 3/10
    action 3 [1, 0, 1]
        75 : 1
state 57 [0, 0, 0] traps
    action 0 [1, 0, 1]
        78 : 7/10
        85 : 3/10
    action 1 [1, 0, 1]
        81 : 1
    action 2 [1, 0, 1]
        84 : 1
    action 3 [1, 0, 1]
        75 : 3/10
        86 : 7/10
state 58 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        74 : 3/10
        79 : 7/10
    action 1 [1, 0, 1]
        76 : 1
    action 2 [1, 0, 1]
        83 : 3/10
        86 : 7/10
state 59 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        77 : 3/10
        86 : 7/10
    action 1 [1, 0, 1]
        81 : 7/10
        87 : 3/10
    action 2 [1, 0, 1]
        76 : 1
state 60 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        78 : 3/10
        83 : 7/10
    action 1 [1, 0, 1]
        87 : 1
    action 2 [1, 0, 1]
        76 : 3/10
        80 : 7/10
state 61 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        78 : 7/10
        83 : 3/10
    action 1 [1, 0, 1]
        88 : 1
    action 2 [1, 0, 1]
        74 : 3/10
        82 : 7/10
state 62 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        88 : 1
    action 1 [1, 0, 1]
        84 : 7/10
        87 : 3/10
    action 2 [1, 0, 1]
        77 : 3/10
        78 : 7/10
state 63 [0, 0, 0] notbad stationvisit
    action 0 [3, 1, 0]
        25 : 1
state 64 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        85 : 1
    action 1 [1, 0, 1]
        81 : 3/10
        83 : 7/10
    action 2 [1, 0, 1]
        89 : 1
    action 3 [1, 0, 1]
        77 : 7/10
        79 : 3/10
state 65 [0, 0, 0] goal notbad
    action 0 [0, 0, 0]
        65 : 1
state 66 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        84 : 3/10
        89 : 7/10
    action 1 [1, 0, 1]
        82 : 3/10
        85 : 7/10
state 67 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        88 : 3/10
        89 : 7/10
    action 1 [1, 0, 1]
        87 : 1
    action 2 [1, 0, 1]
        83 : 7/10
        86 : 3/10
state 68 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        77 : 7/10
        86 : 3/10
    action 1 [1, 0, 1]
        85 : 7/10
        88 : 3/10
    action 2 [1, 0, 1]
        74 : 1
state 69 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        20 : 7/10
        28 : 3/10
    action 1 [1, 0, 1]
        24 : 7/10
        90 : 3/10
    action 2 [1, 0, 1]
        19 : 1
state 70 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        27 : 7/10
        32 : 3/10
    action 1 [1, 0, 1]
        90 : 1
    action 2 [1, 0, 1]
        20 : 3/10
        21 : 7/10
state 71 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        21 : 7/10
        29 : 3/10
    action 1 [1, 0, 1]
        90 : 1
    action 2 [1, 0, 1]
        19 : 3/10
        23 : 7/10
state 72 [0, 0, 0] goal notbad
    action 0 [0, 0, 0]
        72 : 1
state 73 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        32 : 1
    action 1 [1, 0, 1]
        31 : 7/10
        90 : 3/10
    action 2 [1, 0, 1]
        28 : 3/10
        29 : 7/10
state 74 [0, 0, 0] notbad stationvisit
    action 0 [3, 1, 0]
        1 : 1
state 75 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        91 : 7/10
        92 : 3/10
    action 1 [1, 0, 1]
        93 : 1
    action 2 [1, 0, 1]
        94 : 7/10
        95 : 3/10
state 76 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        91 : 3/10
        96 : 7/10
    action 1 [1, 0, 1]
        97 : 7/10
        98 : 3/10
state 77 [0, 0, 0] notbad stationvisit
    action 0 [3, 1, 0]
        25 : 1
state 78 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        99 : 1
    action 1 [1, 0, 1]
        95 : 7/10
        98 : 3/10
    action 2 [1, 0, 1]
        100 : 1
    action 3 [1, 0, 1]
        91 : 3/10
        101 : 7/10
state 79 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        92 : 1
    action 1 [1, 0, 1]
        93 : 3/10
        96 : 7/10
    action 2 [1, 0, 1]
        101 : 7/10
        102 : 3/10
state 80 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        94 : 7/10
        101 : 3/10
    action 1 [1, 0, 1]
        98 : 7/10
        103 : 3/10
    action 2 [1, 0, 1]
        93 : 1
state 81 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        95 : 7/10
        102 : 3/10
    action 1 [1, 0, 1]
        103 : 1
    action 2 [1, 0, 1]
        93 : 3/10
        97 : 7/10
state 82 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        94 : 3/10
        101 : 7/10
    action 1 [1, 0, 1]
        99 : 7/10
        104 : 3/10
    action 2 [1, 0, 1]
        92 : 1
state 83 [0, 0, 0] traps
    action 0 [1, 0, 1]
        99 : 3/10
        102 : 7/10
    action 1 [1, 0, 1]
        98 : 1
    action 2 [1, 0, 1]
        105 : 1
    action 3 [1, 0, 1]
        94 : 7/10
        96 : 3/10
state 84 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        100 : 7/10
        104 : 3/10
    action 1 [1, 0, 1]
        103 : 1
    action 2 [1, 0, 1]
        94 : 3/10
        95 : 7/10
state 85 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        95 : 3/10
        102 : 7/10
    action 1 [1, 0, 1]
        104 : 1
    action 2 [1, 0, 1]
        92 : 3/10
        106 : 7/10
state 86 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        101 : 7/10
        106 : 3/10
    action 1 [1, 0, 1]
        97 : 1
    action 2 [1, 0, 1]
        95 : 7/10
        105 : 3/10
    action 3 [1, 0, 1]
        96 : 1
state 87 [0, 0, 0] goal notbad
    action 0 [0, 0, 0]
        87 : 1
state 88 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        100 : 7/10
        105 : 3/10
    action 1 [1, 0, 1]
        99 : 7/10
        106 : 3/10
state 89 [0, 0, 0] notbad
    action 0 [1, 0, 1]
        104 : 1
    action 1 [1, 0, 1]
        103 : 3/10
        105 : 7/10
    action 2 [1, 0, 1]
        101 : 3/10
        102 : 7/10
state 90 [0, 0, 0] goal notbad
    action 0 [0, 0, 0]
        90 : 1
state 91 [0, 0, 0]
    action 0 [0, 0, 0]
        91 : 1
state 92 [0, 0, 0] notbad stationvisit
    action 0 [3, 1, 0]
        1 : 1
state 93 [0, 0, 0]
    action 0 [0, 0, 0]
        93 : 1
state 94 [0, 0, 0]
    action 0 [0, 0, 0]
        94 : 1
state 95 [0, 0, 0] traps
    action 0 [0, 0, 0]
        95 : 1
state 96 [0, 0, 0]
    action 0 [0, 0, 0]
        96 : 1
state 97 [0, 0, 0]
    action 0 [0, 0, 0]
        97 : 1
state 98 [0, 0, 0]
    action 0 [0, 0, 0]
        98 : 1
state 99 [0, 0, 0]
    action 0 [0, 0, 0]
        99 : 1
state 100 [0, 0, 0]
    action 0 [0, 0, 0]
        100 : 1
state 101 [0, 0, 0] notbad stationvisit
    action 0 [3, 1, 0]
        25 : 1
state 102 [0, 0, 0]
    action 0 [0, 0, 0]
        102 : 1
state 103 [0, 0, 0] goal
    action 0 [0, 0, 0]
        103 : 1
    action 1 [0, 0, 0]
        103 : 1
state 104 [0, 0, 0]
    action 0 [0, 0, 0]
        104 : 1
state 105 [0, 0, 0]
    action 0 [0, 0, 0]
        105 : 1
state 106 [0, 0, 0]
    action 0 [0, 0, 0]
        106 : 1

For exact solving, the current workaround is to use soplex.

tquatmann commented 1 month ago

I believe I understood the issue: For Rmin kind of properties, we need to clear those choices from the maybeStates that lead to a state with infinite expected reward. This happens somewhere in the SparseMdpPrctlHelper. However, the initial policy is currently computed before this step and thus the selected choice indices in the initial policy might become invalid.