x-tu / GGF-wcMDP

0 stars 0 forks source link

Whittle Index - unbalanced costs #47

Open x-tu opened 4 months ago

x-tu commented 4 months ago

There are some machines never gain costs.

Screenshot 2024-07-13 at 03 10 28
x-tu commented 4 months ago

A special cost structure that repairs take less than operations.

Screenshot 2024-07-13 at 12 16 03

(blue-operation, orange-repair)

Trajectories for whittle index policy: [2 2] [1 0] [0.33333333 0. ] [0 2] [0 2] [1 0] [1.28333333 0. ] [1 2] [1 2] [1 0] [2.03541667 0. ] [1 2] [1 2] [1 0] [2.74989583 0. ] [1 2] [1 2] [1 0] [3.42865104 0. ] [0 2] [0 2] [1 0] [4.20243198 0. ] [0 2] [0 2] [1 0] [4.93752387 0. ] [0 2] [0 2] [1 0] [5.63586117 0. ] [0 2] [0 2] [1 0] [6.2992816 0. ] [0 2] [0 2] [1 0] [6.92953101 0. ] [0 2] [0 2] [1 0] [7.52826795 0. ] [0 2] [0 2] [1 0] [8.09706804 0. ] [0 2] [0 2] [1 0] [8.63742813 0. ] [0 2] [0 2] [1 0] [9.15077021 0. ] [1 2] [1 2] [1 0] [9.55716603 0. ] [1 2] [1 2] [1 0] [9.94324205 0. ] [0 2] ... ...

Trajectories for PPO policy (count-based): [1 0 1] [0 0 1] [1. 0.31666667] [1 1 0] [1 1 0] [0 1 0] [1. 1.06875] [2 0 0] [2 0 0] [1 0 0] [1.857375 1.06875 ] [2 0 0] [2 0 0] [1 0 0] [2.67188125 1.06875 ] [2 0 0] [2 0 0] [1 0 0] [3.44566219 1.06875 ] [1 1 0] [1 1 0] [0 1 0] [4.05823876 1.06875 ] [2 0 0] [2 0 0] [1 0 0] [4.05823876 1.7670873 ] [2 0 0] [2 0 0] [1 0 0] [4.05823876 2.43050773] [1 1 0] [1 1 0] [0 1 0] [4.05823876 2.95571557] [2 0 0] [2 0 0] [1 0 0] [4.05823876 3.55445251] [2 0 0] [2 0 0] [1 0 0] [4.05823876 4.1232526 ] [2 0 0] [2 0 0] [1 0 0] [4.59859885 4.1232526 ] [1 1 0] [1 1 0] [0 1 0] [5.02638392 4.1232526 ] [0 2 0] [0 2 0] [0 1 0] [5.43277974 4.1232526 ] [1 1 0] [1 1 0] [0 1 0] [5.81885576 4.1232526 ] [1 1 0] [1 1 0] [0 1 0] [5.81885576 4.49002482] [2 0 0] [2 0 0] [1 0 0] [5.81885576 4.90814516] [1 1 0] [1 1 0] [0 1 0] [6.14986769 4.90814516] [2 0 0] [2 0 0] [1 0 0] [6.5272213 4.90814516] [1 1 0] [1 1 0] [0 1 0] [6.5272213 5.20688343] [2 0 0] [2 0 0] [1 0 0] [6.86778292 5.20688343] [2 0 0] [2 0 0] [1 0 0] [7.19131647 5.20688343] [2 0 0] [2 0 0] [1 0 0] [7.49867333 5.20688343] [1 1 0] [1 1 0] [0 1 0] [7.49867333 5.45020762] [1 1 0] [1 1 0] [0 1 0] [7.72983131 5.45020762] [2 0 0] [2 0 0] [1 0 0] [7.72983131 5.71372771] [2 0 0] [2 0 0] [1 0 0] [7.72983131 5.9640718 ] [2 0 0] [2 0 0] [1 0 0] [7.9676582 5.9640718] [2 0 0] [2 0 0] [1 0 0] [8.19359374 5.9640718 ] [2 0 0] [2 0 0] [1 0 0] [8.4082325 5.9640718] [1 1 0] [1 1 0] [0 1 0] [8.57815486 5.9640718 ] [2 0 0] [2 0 0] [1 0 0] [8.77186634 5.9640718 ] [2 0 0] [2 0 0] [1 0 0] [8.77186634 6.14809771] [2 0 0] [2 0 0] [1 0 0] [8.94669096 6.14809771] [2 0 0] [2 0 0] [1 0 0] [8.94669096 6.31418109] [2 0 0] [2 0 0] [1 0 0] [8.94669096 6.47196031] [1 1 0] [1 1 0] [0 1 0] [8.94669096 6.59686885] [2 0 0] [2 0 0] [1 0 0] [9.0890867 6.59686885] [2 0 0] [2 0 0] [1 0 0] [9.22436265 6.59686885] [1 1 0] [1 1 0] [0 1 0] [9.33145612 6.59686885] [2 0 0] [2 0 0] [1 0 0] [9.45354266 6.59686885] [2 0 0] [2 0 0] [1 0 0] [9.45354266 6.71285107] [0 2 0] [0 2 0] [0 1 0] [9.45354266 6.80467033] [1 1 0] [1 1 0] [0 1 0] [9.54077096 6.80467033] [1 1 0] [1 1 0] [0 1 0] [9.62363784 6.80467033] [1 1 0] [1 1 0] [0 1 0] [9.70236138 6.80467033] [1 1 0] [1 1 0] [0 1 0] [9.70236138 6.87945769] [1 1 0] [1 1 0] [0 1 0] [9.70236138 6.95050569] [2 0 0] ... ...

x-tu commented 4 months ago

DLP and RL algorithms also have this issue in batch runs. No need to further investigate for the moment.