kpot / kerl

KERL: reinforcement learning algorithms and tools implemented using Keras
MIT License
11 stars 4 forks source link

Order of 'new' and 'old' for the PopArtLayer #1

Closed haiyanyin closed 5 years ago

haiyanyin commented 5 years ago

Lines 257-259 of kerl/common/utils.py, It seems that the order of new and old values are reversed.

kpot commented 5 years ago

Can you elaborate, please? Are you talking about the line line 253 (return (1 - adj_beta) * old + adj_beta * new) specifically? Because I can't see any errors. β here and in the original paper is supposed to be a small, close to zero value (1e-4 by default), so the update rule seems reasonable.

haiyanyin commented 5 years ago

Hi. The first term with weight (1-adj_beta) should be the old value, and the term with weight adj_beta should be the new value.

But based on the function call at lines 257-259, it seems that the new value is getting weight (1-adj_beta). So the order of inputs to 'update_rule' func is reversed.

kpot commented 5 years ago

Sorry, but I still can't see the problem. According to how reduce is working, the first argument of the update rule is the accumulator, and the second one is the new value. So the old is the accumulator in this case. The new is going to be taken from the iterable, which is x_means = np.stack([x.mean(axis=0), np.square(x).mean(axis=0)], axis=1) (pay attention to the axis arguments being used with mean and stack). At each step, reduce takes values for the next timestep and accumulates them using the exponentially weighted moving average (line 253), one step at a time. It's a vectorized way of calculating both µ and ν simultaneously.

haiyanyin commented 5 years ago

Sorry. I didn't understand the 'reduce' function works with iterable. Then the algo is correct and this issue should be closed.