Undetected multi-objective infinite reward

Chickenpowerrr commented 1 month ago

When evaluating multi(R{"inf"}min=? [C], R{"simp"}<=0 [C]) on the following MDP, the output is false, while the actual result should be infinity. Hence I would expect to get the output: There is no Pareto optimal scheduler that yields finite reward for all objectives. There is no Pareto optimal scheduler that yields finite reward for all objectives. Moreover, multi(R{"simp"}<=0 [C]) is considered achievable, so multi(R{"inf"}min=? [C], R{"simp"}<=0 [C]) should not be unachievable.

mdp

module model
    s : [0..2];

    [] s=0 -> (s'=1);
    [simp] s=0 -> (s'=2);
    [inf] s=1 -> true;
    [] s=2 -> true;
endmodule

rewards "simp"
    [simp] true: 1;
endrewards

rewards "inf"
    [inf] true: 1;
endrewards

Storm 1.9.0

Date: Sat Oct 19 14:44:17 2024
Command line arguments: --prism precomputation.nm -prop 'multi(R{"inf"}min=? [C], R{"simp"} <= 0 [C])'
Current working directory:

Time for model input parsing: 0.003s.

Time for model construction: 0.011s.

--------------------------------------------------------------
Model type:     MDP (sparse)
States:         3
Transitions:    4
Choices:        4
Reward Models:  simp, inf
State Labels:   2 labels
   * deadlock -> 0 item(s)
   * init -> 1 item(s)
Choice Labels:  none
--------------------------------------------------------------

Model checking property "1": multi(R{"inf"}min=? [C], R{"simp"}<=0 [C]) ...
Result (for initial states): false

Time for model checking: 0.006s.

Storm 1.9.0

Date: Sat Oct 19 14:47:47 2024
Command line arguments: --prism precomputation.nm -prop 'multi(R{"simp"} <= 0 [C])'
Current working directory:

Time for model input parsing: 0.002s.

Time for model construction: 0.009s.

--------------------------------------------------------------
Model type:     MDP (sparse)
States:         3
Transitions:    4
Choices:        4
Reward Models:  simp
State Labels:   2 labels
   * deadlock -> 0 item(s)
   * init -> 1 item(s)
Choice Labels:  none
--------------------------------------------------------------

Model checking property "1": multi(R{"simp"}<=0 [C]) ...
Result (for initial states): true

Time for model checking: 0.000s.

tquatmann commented 3 weeks ago

In multi-objective model checking, we restrict to policies that yield finite reward for all objectives. In particular, this means that we drop solutions that are infinitely "bad" for one or more objectives.

For the given example, this means that selecting the 'simp' action is the only valid policy for the given set of objectives. Consequently, there is no valid policy under which R{"simp"} <=0 [C] holds. We (correctly) get false as an answer.

Note that this assumption avoids some nasty technicalities within the algorithm (e.g. what would be the convex hull of the two points (0,1) and (inf,0) and how to represent this?).

I discussed this situation and possible workarounds also a bit in my dissertation (Section 3.3), in case you are interested.

Please also note that outputting There is no Pareto optimal scheduler that yields finite reward for all objectives would not be accurate: Selecting simp in s=0 is Pareto optimal and yields finite rewards.

Chickenpowerrr commented 3 weeks ago

Hi Tim,

Thanks the great explanation and reference, I read the relevant section and understand the output now. I will close the issue.

Kind regards,

Mark

moves-rwth / storm

Undetected multi-objective infinite reward #624