hpi-sam / rl-4-self-repair

Reinforcement Learning Models for Online Learning of Self-Repair and Self-Optimization
MIT License
0 stars 1 forks source link

Non-Stationary Environment - Return rewards without replacement #21

Closed christianadriano closed 4 years ago

christianadriano commented 4 years ago

To consistently use the non-stationary data, we need two modifications to the function that returns the reward of an action (only for the non-stationary case, the existing cases remain the same).

1- For the non-stationary environment, each utility_increase that is returned to the agent should be removed (or marked as used), so it is not returned again in another call to the environment.

2- For the non-stationary environment, the utility_increase has to be consumed in a same order, because they are a time series now.

brrrachel commented 4 years ago

implemented with the commit 282d0fb