Closed ven-kyoshiro closed 5 years ago
Thanks @ven-kyoshiro for opening the issue.
@kyr-pol git blame shows that you added this initialisation. What was the rational behind it? Is this something that e.g. exists in the MATLAB implementation or helps in practice?
Also, @ven-kyoshiro can you elaborate on why would you want the cost to be constant across theta? Also, it would be helpful to make the same argument with a real, simple example (like an inverted pendulum).
Thanks for kind response!! I read this original paper . An equation (25) in the paper was defined by using L2-Norm, so I think the state which is same distance from target should return the same reward.
Hey @ven-kyoshiro, thanks, that's a good catch. I agree eye(.)
would be more appropriate as the default value. In most cases we expect the user to define a reward relevant to the task at hand, but eye(.)
seems more reasonable as the standard choice, and even in our examples the rewards used are diagonal. I think I initially went with ones
for debugging purposes (many zero values could hide mistakes), but that's not relevant now.
The initialisation of the weights
W
of the exponential reward has a default initialisation ofnp.ones(.)
, as defined in the following line: https://github.com/nrontsis/PILCO/blob/6ebcc7df9a8190c542445f0c82d835c94b745c8e/pilco/rewards.py#L25But when I calculate the reward mean concentrically centered on the target,
the score is not a constant, as it can observed in the following plot:
So I think np.eye(state_dim) is better, i.e. having
which gives the following score across theta: