nrontsis / PILCO

Bayesian Reinforcement Learning in Tensorflow
MIT License
314 stars 84 forks source link

Computation time for policy optimization #39

Open dqxajzh opened 4 years ago

dqxajzh commented 4 years ago

I find that the computation time for policy optimization will gradually increase, and the project is terminated by the tensorflow ResourceExhaustedError.

NicolayP commented 4 years ago

My guess is that this is inherently due to the very nature of Gaussian Processes. GPs keeps all the data in memory that will then be used to do a prediction. The more you run pilco the more samples will be collected and thus the prediction time will increase. If your familiar with the big O notation, gp prediction time is O(n^3) where n is the number of samples. There is some research going on to reduce this ( sparse gaussians etc) but in the overall, the more samples you have, the longer the policy optimization will be.

nrontsis commented 4 years ago

I think I agree with @NicolayP. You might want to use sparse gaussian processes, which are already implemented in PILCO.

Let me know if this helps with your problem.

dqxajzh commented 4 years ago

Thank you for your help @nrontsis @NicolayP, I will try the sparse gaussian processes PILCO