nrontsis / PILCO

Bayesian Reinforcement Learning in Tensorflow
MIT License
313 stars 84 forks source link

Speed up the policy optimization by @tf.function #57

Open ikamensh opened 3 years ago

ikamensh commented 3 years ago

Using @tf.function allowed for at 2-3 times speedup of policy optimization (longest step) on 'Pendulum-v0' environment. This works by pre-compiling the tensorflow graph on first execution of the function. Please let me know if you need more careful speedup analysis.

nrontsis commented 3 years ago

@kyr-pol what do you think?

kyr-pol commented 3 years ago

That sounds good to me, could you test whether the change is compatible with the safety extension too (like the safe_cars_run script in the examples)? I think I had tried the tf.function decorator when moving the project to tensorflow 2.x, and I faced some issues, which might have been on my part or patched by now. If there are no issues there and the tests are successful, we can go ahead.

ikamensh commented 3 years ago

I don't have Mujoco license, so I can only test the safe_cars_run.py. From high-level examination it seems not to show much of a difference for two versions: see output per links

https://justpaste.it/5ozx8 vs https://justpaste.it/2ebz3

I must say I am not very sure how to read this output, but the final reward is same order of magnitude. Speedup is not that obvious in that environment.

kyr-pol commented 3 years ago

Great, that looks totally fine to me, I was mostly worried for tensorflow errors/crashes, this looks reasonable!

gitAugust commented 3 years ago

Using @tf.function will transfor EagerTensor to Tensor with can't iterate automatically, so there will be errors with safe_swimmer_run.py. Do you gays have some solutions?