hpi-sam / rl-4-self-repair

Reinforcement Learning Models for Online Learning of Self-Repair and Self-Optimization
MIT License
0 stars 1 forks source link

[Data] Normalization and Scaling - Effect on algorithm convergence and stability #2

Open christianadriano opened 4 years ago

christianadriano commented 4 years ago

@brrrachel and @2start (Nico)

Many approximation algorithms converge better when the data is normalized (zero to one) and scaled (mean==0) . Could you please investigate if this is a possible issue that would be interesting to show?

If positive, we can easily run the algorithms with four different subsets of the utility increase data combining small and large kurtosis and skewness. We would be looking at how quickly (number of episodes) each run achieves a certain level of exploitation (reduces exploration) and how quick it reaches the maximum reward (within a determined margin of error). These are charts that Nico has already developed.

2start commented 4 years ago

@christianadriano @brrrachel Sorry, I guess I am a little late to the party.

I thought about this one again. Normalization is used in ML algorithms to normalize the impact of different predictor variables on the target variable. However, we only have a single input variable, the reward. Therefore normalization will probably have no effect because in the current RL algorithms, there is no part sensible to the absolute size of the rewards.

Regarding the transformation of the raw utilities: I don't think this is useful either because we want the agent to maximize the total utility/reward.

r_1 + r_2 + ... + r_n

However, if we somehow use a function f to alter all the rewards r_1 .. r_n we maximize the following function:

f(r_1) + f(r_2) + .. + f(r_n) 

Therefore, I propose we drop this part of modifying the input data. An interesting task, however, would be to analyze the data to find possible faults/interesting characteristics, which will help us understanding the results later on.

christianadriano commented 4 years ago

@brrrachel I would like to hear Rachel opinion on this too.

brrrachel commented 4 years ago

Well I did some research about this too. Currently our aim is to be able to better predict / distinguish between the <component, failure> combinations. Some important points about normalization:

Then I red a little bit more about it and could help to deal with a non-stationary environment, too. Since in reinforcement learning the policy of behavior can change during learning, thereby it changes the distribution and magnitude of the values. An approach to deal with that is presented here: https://arxiv.org/pdf/1602.07714.pdf

Currently, I haven't a proved approach how to implement normalisation (since it isn't always about to simply scale it to [-1;1]) but I would not prefer to drop this idea right now.