Discount and gain factor

@christianadriano @MrBanhBao To sum up what I understood so far ...

Basic concept: over time the reward for a certain repair action should decrease

The cumulative reward is the sum of all rewards for the repairs which are necessary bring the whole system back to work at all. The failures which are needed to be repaired can be fixed in different orders.
Since we expect that with time the reward decreases, we have to introduce a factor to multiply with the utility increase value offered by the data. This factor is called gain factor.
The Discount Factor is a factor which must be multiplied to the reward in order to obtain the same value over the time. This factor is parameterized by the algorithm and can change over the time.

hpi-sam / rl-4-self-repair