hpi-sam / rl-4-self-repair

Reinforcement Learning Models for Online Learning of Self-Repair and Self-Optimization
MIT License
0 stars 1 forks source link

How to pick the initial state? #6

Closed 2start closed 4 years ago

2start commented 4 years ago

Each permutation of failures might be a viable state. However, we need a fixed action space for q-learning.

Brainstorming of possibilities:

Choose a fixed initial state:

Mode each initial state as a state in the environment:

Idea: Limit the lookahead to a random subset of the states to choose the best action.

Other ideas are appreciated.

MrBanhBao commented 4 years ago

As we discussed we abandoned the idea of swapping components. Component-failure-pairs now can be repaired directly disregarding their order.

Initial state: the given list of component-failure-pairs Action space size is now: number of component-failure-pairs (selected component-failure-pair is getting repaired) State/Observation space size: number of component-failure-pairs to the power of two (each component is ether repaired or still broken)