Closed haiyanyin closed 5 years ago
Can you elaborate, please? Are you talking about the line line 253 (return (1 - adj_beta) * old + adj_beta * new
) specifically? Because I can't see any errors. β here and in the original paper is supposed to be a small, close to zero value (1e-4 by default), so the update rule seems reasonable.
Hi. The first term with weight (1-adj_beta) should be the old value, and the term with weight adj_beta should be the new value.
But based on the function call at lines 257-259, it seems that the new value is getting weight (1-adj_beta). So the order of inputs to 'update_rule' func is reversed.
Sorry, but I still can't see the problem. According to how reduce
is working, the first argument of the update rule is the accumulator, and the second one is the new value. So the old
is the accumulator in this case. The new
is going to be taken from the iterable, which is x_means = np.stack([x.mean(axis=0), np.square(x).mean(axis=0)], axis=1)
(pay attention to the axis
arguments being used with mean
and stack
). At each step, reduce
takes values for the next timestep and accumulates them using the exponentially weighted moving average (line 253), one step at a time. It's a vectorized way of calculating both µ and ν simultaneously.
Sorry. I didn't understand the 'reduce' function works with iterable. Then the algo is correct and this issue should be closed.
Lines 257-259 of kerl/common/utils.py, It seems that the order of new and old values are reversed.