nrontsis / PILCO

Bayesian Reinforcement Learning in Tensorflow
MIT License
311 stars 84 forks source link

Missing \delta^2 in C = max_action * tf.diag( tf.exp(-tf.diag_part(s)/2) * tf.cos(m)) #15

Closed hejia-zhang closed 5 years ago

hejia-zhang commented 5 years ago

It seems C_ii = max_action_i * (E[X_i]*sin(X_i)] - E[X_i]*E[sin(X_i))]) E[X_i*sin(X_i)] = exp(-var(X_i) / 2)(var(X_i) * cos(mean(X_i)) + mean(X_i)sin(mean(X_i))) However, there is no var(X_i) in the formula for C in the code

I think it should be C = max_action * tf.diag(tf.exp(-tf.diag_part(s) / 2) * tf.diag_part(s) * tf.cos(m))

nrontsis commented 5 years ago

Thanks for opening the issue but I am having trouble understanding your math. Also, the code in squash_sin has been tested against the MATLAB implementation gSin.m (see this test). So if you are correct, either the unit test is erroneous or both implementations are wrong.

Please write the math more clearly and open a PR if you still think there is a mistake. In doing so you might find helpful Deisenroth's thesis page 42 and Appendix A1 and this paper, section 5.2.

nrontsis commented 5 years ago

Closing due to inactivity.