vlad17 / mve

MVE: model-based value estimation
Apache License 2.0
10 stars 0 forks source link

Accelerate MVE DDPG #351

Closed vlad17 closed 6 years ago

vlad17 commented 6 years ago

Currently the MVE pass in DDPG is very inefficiently, and based on initial benchmarks it seems like all the time is spent inside the tensorflow sess.run(optimize) call.

This is likely not due to the input pipeline--feed dict is fine because DDPG without MVE is much faster, and transfers as much data between the CPU and GPU.

Instead, this is due to the additional computation that MVE requires when doing a model-based unrolling of the H-step horizon: my guess is that TF has a poor default CUDA kernel for this two-step process:

for every item in the batch, simulate H steps ahead with the model (currently a python for loop making H tensorflow nodes -- this manually unrolled loop already improves over tf.while_loop).

In reverse, compute the losses for each of the timesteps. If TF has a smart XLA optimizer this isn't going to get any faster. One immediate improvement that comes to mind is the automatic batch size propogation since the batch size is always fixed. Then TF can pre-allocate tensors. But I didn't get this to work when I tried it.

vlad17 commented 6 years ago

in other words, up to 2x speed up, likely closer to 1.5x, is possible by making the reverse mode call to critic.tf_critic (and then the mean squared error) over the entire horizon