Open dqxajzh opened 4 years ago
My guess is that this is inherently due to the very nature of Gaussian Processes. GPs keeps all the data in memory that will then be used to do a prediction. The more you run pilco the more samples will be collected and thus the prediction time will increase. If your familiar with the big O notation, gp prediction time is O(n^3) where n is the number of samples. There is some research going on to reduce this ( sparse gaussians etc) but in the overall, the more samples you have, the longer the policy optimization will be.
I think I agree with @NicolayP. You might want to use sparse gaussian processes, which are already implemented in PILCO.
Let me know if this helps with your problem.
Thank you for your help @nrontsis @NicolayP, I will try the sparse gaussian processes PILCO
I find that the computation time for policy optimization will gradually increase, and the project is terminated by the tensorflow ResourceExhaustedError.