How to pick the initial state?

hpi-sam / rl-4-self-repair

Reinforcement Learning Models for Online Learning of Self-Repair and Self-Optimization

MIT License

0 stars 1 forks source link

Each permutation of failures might be a viable state. However, we need a fixed action space for q-learning.

Brainstorming of possibilities:

Choose a fixed initial state:

Mode each initial state as a state in the environment:

Realistic, and we could assign a reward/cost from transitioning from each permutation to each other
n*(n+1)! state transitions. Which is too much q-learning can't evaluate all the possible actions.

Idea: Limit the lookahead to a random subset of the states to choose the best action.

Other ideas are appreciated.

hpi-sam / rl-4-self-repair