nrontsis / PILCO

Bayesian Reinforcement Learning in Tensorflow
MIT License
311 stars 84 forks source link

Computation of cross-covariance of state and action #35

Open dvtailor opened 4 years ago

dvtailor commented 4 years ago

From only looking at the docstrings of the relevant functions, I think I noticed a discrepancy to the paper. I am writing this without checking the math in the code so I may be wrong.

V returned in RbfController.compute_action() in controllers.py corresponds to Cov[x,u]

From backtracking to MGPR.predict_given_factorizations() in models/mgpr.py, I think the docstrings indicate that:

V = cov[x,x]^{-1} @ cov[x,pi] @ cov[pi,u]

where I call pi the action before squashing

From section 5.5 of the 2015 paper, it says:

V = cov[x,pi] @ cov[pi,pi]^{-1} @ cov[pi,u]

Are these expressions equivalent or have I misread something. Thanks!